open_knowledge_graph_embeddings icon indicating copy to clipboard operation
open_knowledge_graph_embeddings copied to clipboard

About looking for the original sentences of the triples

Open Frankie123421 opened this issue 4 years ago • 2 comments

Hey, I am recently studying your work. I would like to know if there exists some convenient approach to obtain the original sentences of the triples in the dataset. Must I look for them from OPIEC? Thank you.

Frankie123421 avatar Jan 18 '22 07:01 Frankie123421

That is correct, you have to look at OPIEC. You can reuse the OLPBENCH pipeline (i.e. especially this part https://github.com/samuelbroscheit/open_knowledge_graph_embeddings/blob/main/preprocessing/process_avro.py ) and extend it to also create an index that associates each triple with its source sentence.

samuelbroscheit avatar Jan 18 '22 08:01 samuelbroscheit

Thanks for your kind reply, I got it. So maybe I can also just concatenate the "word" in the "tokens" to directly obtain the sentence, is that correct? I don't know whether this would be more convenient than using the "sentence number" to further look for the sentence in the article, suppose that I just want to find the corresponding original sentences of the triples. By the way, I feel that maybe it will also be difficult to match the triples in this dataset to OPIEC by simply using loops for OPIEC is quite large?

Frankie123421 avatar Jan 18 '22 12:01 Frankie123421