Question about the AGENDA dataset
What's the meaning of the last column in data/preprocessed.train.tsv, such as 6 1 4 5 8 7 -1 0 3 7 -1 2 7 -1? Can you introduce more about the process?
Please ignore this column
Similar doubt, what does the third column in data/preprocessed.train.tsv mean. Example value in the column: 17 5 16 ; 1 0 11 ; 2 0 11 ; 2 0 7 ; 1 0 6 ; 16 4 12 ; 17 5 12 ; 3 1 0 ; 14 1 2
Does this describe relations between the entities in an encoded form?
Thanks for your interest. This is the actual graph, encoded as <head, relation, tail> triples divided by ';' The indexes for head and tail reference the entities list (the second column in the dataset), and the relations are listed in data/relations.vocab. This format was processed from the data in data/unprocessed.tar.gz by some auxiliary formatting tools which are not included in the repo.
Please let me know if you have any other questions!
The unprocessed datasets already contains the "relations":graph<head,relation,tail>, can you please elaborate what algorithm was used to create this relation. Was it based on SicIE only or any other technique?
@GreatlakesSreeharsha this was done with DygIE https://github.com/luanyi/DyGIE
@rikdz I should say this, I'm impressed by your work, thank you for sharing the code with us. I trying to use my own knowledge graph but I don't understand what is the last column in preproccessed.train.tsv indicates ``such as 6 1 4 5 8 7 -1 0 3 7 -1 2 7 -1? ``` As you replied @zhongluwang to ignore this but isn't that an important column when we want to train a model? Since that column is being used internally.
This is a fascinating work! @rikdz I am unable to understand the third and the last column. Would you please explain those for me?
@rikdz and if I want to give my custom inputs (titles and graphs), how would I be able to do that?
You can process your data to a tab-separated list with the following fields:
title, entities, entity type, graph, target, ordering
"entities" is a semi-colon delimited list of entities "entity type" is a space delimited list of one type token associated with each entity "graph" is a semi-colon delimited list of graph triples, where the head and tail are indexes of the entities in the "entities" list and the relation is the index of the relation in the data/relations.vocab file "target" is your target text with entities replaced by placeholders indicating the entity type and its index in the "entities" list ordering is not used, but you may need to put a placeholder there to get the code to work as is. Alternately, you can modify the code to ignore this field
hope this helps
Thanks a lot sir. This makes things a lot easier for me.
On Tue, 22 Jun 2021, 21:25 Rik Koncel-Kedziorski, @.***> wrote:
You can process your data to a tab-separated list with the following fields:
title, entities, entity type, graph, target, ordering
"entities" is a semi-colon delimited list of entities "entity type" is a space delimited list of one type token associated with each entity "graph" is a semi-colon delimited list of graph triples, where the head and tail are indexes of the entities in the "entities" list and the relation is the index of the relation in the data/relations.vocab file "target" is your target text with entities replaced by placeholders indicating the entity type and its index in the "entities" list ordering is not used, but you may need to put a placeholder there to get the code to work as is. Alternately, you can modify the code to ignore this field
hope this helps
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rikdz/GraphWriter/issues/4#issuecomment-866111406, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASDSE42FUTKSI77L7OUYRXLTUCW5VANCNFSM4I3BHFIA .
Hello. Can you tell us what the last column in preprocessed data was supposed to mean? Will the model train well if I just ignore the last column or fill it with random data ?
Yes, the last column could be filled with random data and it should not affect training.
On Thu, Aug 11, 2022 at 9:08 AM wangobango @.***> wrote:
Hello. Can you tell us what the last column in preprocessed data was supposed to mean? Will the model train well if I just ignore the last column or fill it with random data ?
— Reply to this email directly, view it on GitHub https://github.com/rikdz/GraphWriter/issues/4#issuecomment-1212193786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALWGJKVMZA52DIG4LHTNQT3VYUQPDANCNFSM4I3BHFIA . You are receiving this because you were mentioned.Message ID: @.***>