| ## Dataset Processing | |
| ### Our Benchmark (processed OIE2016) | |
| Firstly, download our benchmark tailored for compact extractions provided [`here`](https://zenodo.org/record/7014032#.YwQQ0OzMJb8) and put it under [`data/OIE2016(processed)`](https://github.com/FarimaFatahi/CompactIE/tree/master/data/OIE2016(processed)). | |
| Secondly, split out the train, development, test set for the constituent extraction model by running: | |
| ``` | |
| cd OIE2016(processed)/constituent_model | |
| python process_constituent_data.py | |
| ``` | |
| Lastly, split out the train, development, test set for the constituent linking model by running: | |
| ``` | |
| cd OIE2016(processed)/relation_model | |
| python process_linking_data.py | |
| ``` | |
| Note that the data folders for training each model are set to the ones mentioned above. | |
| ### Evaluation Benchmarks | |
| Three evaluation benchmarks (**BenchIE**, **CaRB**, and **Wire57**) are used for evaluating CompactIE's performance. Note that since these datasets are not targeted for compact triples, we exclude triples that have at least one clause within a constituent. | |
| To get the final data (json format) for these benchmarks, run: | |
| ```bash | |
| ./process_test_data.sh | |
| ``` | |
| ### Other files | |
| Since the schema design of the table filling model does not support conjunctions inside constituents, we use the conjunction module developed by [`OpenIE6`](https://github.com/dair-iitd/openie6) to break sentences into smaller conjunction-free sentences before passing them to the system. | |
| Therefore, input new test files (`source_file.txt`), produce the conjunction file (`conjunctions.txt`) and then run: | |
| ``` | |
| python process.py --source_file source_file.txt --target_file output.json --conjunctions_file conjunctions.txt | |
| ``` | |
| ### Compactness measurement | |
| To measure the compactness metrics mentioned in the paper (AL, NCC, RPA), set the `INPUT_FILE` variable inside the following scrip to the test file path and run it as follows: | |
| ``` | |
| python compactness_measurements.py | |
| ``` | |