GraphWriter icon indicating copy to clipboard operation
GraphWriter copied to clipboard

Question about the AGENDA dataset

Open zhongluwang opened this issue 6 years ago • 12 comments

What's the meaning of the last column in data/preprocessed.train.tsv, such as 6 1 4 5 8 7 -1 0 3 7 -1 2 7 -1? Can you introduce more about the process?

zhongluwang avatar Sep 27 '19 03:09 zhongluwang

Please ignore this column

rikdz avatar Oct 18 '19 16:10 rikdz

Similar doubt, what does the third column in data/preprocessed.train.tsv mean. Example value in the column: 17 5 16 ; 1 0 11 ; 2 0 11 ; 2 0 7 ; 1 0 6 ; 16 4 12 ; 17 5 12 ; 3 1 0 ; 14 1 2

Does this describe relations between the entities in an encoded form?

agrawal-rohit avatar Apr 04 '20 10:04 agrawal-rohit

Thanks for your interest. This is the actual graph, encoded as <head, relation, tail> triples divided by ';' The indexes for head and tail reference the entities list (the second column in the dataset), and the relations are listed in data/relations.vocab. This format was processed from the data in data/unprocessed.tar.gz by some auxiliary formatting tools which are not included in the repo.

Please let me know if you have any other questions!

rikdz avatar Apr 04 '20 14:04 rikdz

The unprocessed datasets already contains the "relations":graph<head,relation,tail>, can you please elaborate what algorithm was used to create this relation. Was it based on SicIE only or any other technique?

GreatlakesSreeharsha avatar May 07 '20 08:05 GreatlakesSreeharsha

@GreatlakesSreeharsha this was done with DygIE https://github.com/luanyi/DyGIE

rikdz avatar May 11 '20 17:05 rikdz

@rikdz I should say this, I'm impressed by your work, thank you for sharing the code with us. I trying to use my own knowledge graph but I don't understand what is the last column in preproccessed.train.tsv indicates ``such as 6 1 4 5 8 7 -1 0 3 7 -1 2 7 -1? ``` As you replied @zhongluwang to ignore this but isn't that an important column when we want to train a model? Since that column is being used internally.

anandhperumal avatar Oct 28 '20 14:10 anandhperumal

This is a fascinating work! @rikdz I am unable to understand the third and the last column. Would you please explain those for me?

pratik2358 avatar Jun 20 '21 18:06 pratik2358

@rikdz and if I want to give my custom inputs (titles and graphs), how would I be able to do that?

pratik2358 avatar Jun 20 '21 18:06 pratik2358

You can process your data to a tab-separated list with the following fields:

title, entities, entity type, graph, target, ordering

"entities" is a semi-colon delimited list of entities "entity type" is a space delimited list of one type token associated with each entity "graph" is a semi-colon delimited list of graph triples, where the head and tail are indexes of the entities in the "entities" list and the relation is the index of the relation in the data/relations.vocab file "target" is your target text with entities replaced by placeholders indicating the entity type and its index in the "entities" list ordering is not used, but you may need to put a placeholder there to get the code to work as is. Alternately, you can modify the code to ignore this field

hope this helps

rikdz avatar Jun 22 '21 15:06 rikdz

Thanks a lot sir. This makes things a lot easier for me.

On Tue, 22 Jun 2021, 21:25 Rik Koncel-Kedziorski, @.***> wrote:

You can process your data to a tab-separated list with the following fields:

title, entities, entity type, graph, target, ordering

"entities" is a semi-colon delimited list of entities "entity type" is a space delimited list of one type token associated with each entity "graph" is a semi-colon delimited list of graph triples, where the head and tail are indexes of the entities in the "entities" list and the relation is the index of the relation in the data/relations.vocab file "target" is your target text with entities replaced by placeholders indicating the entity type and its index in the "entities" list ordering is not used, but you may need to put a placeholder there to get the code to work as is. Alternately, you can modify the code to ignore this field

hope this helps

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rikdz/GraphWriter/issues/4#issuecomment-866111406, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASDSE42FUTKSI77L7OUYRXLTUCW5VANCNFSM4I3BHFIA .

pratik2358 avatar Jun 22 '21 16:06 pratik2358

Hello. Can you tell us what the last column in preprocessed data was supposed to mean? Will the model train well if I just ignore the last column or fill it with random data ?

wangobango avatar Aug 11 '22 16:08 wangobango

Yes, the last column could be filled with random data and it should not affect training.

On Thu, Aug 11, 2022 at 9:08 AM wangobango @.***> wrote:

Hello. Can you tell us what the last column in preprocessed data was supposed to mean? Will the model train well if I just ignore the last column or fill it with random data ?

— Reply to this email directly, view it on GitHub https://github.com/rikdz/GraphWriter/issues/4#issuecomment-1212193786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALWGJKVMZA52DIG4LHTNQT3VYUQPDANCNFSM4I3BHFIA . You are receiving this because you were mentioned.Message ID: @.***>

rikdz avatar Aug 11 '22 16:08 rikdz