Where do test dirs props, null, and ne come from?
Hi!
I noticed in make-wsj-test.sh and make-brown-test.sh that we try to zcat a props, null, and ne file from test.wsj. However, in the extract_test_from_ptb.sh and extract_test_from_brown.sh scripts, none of these dirs/files are generated. Where are these supposed to come from?
Thanks!
Those dirs should be under the train directory in the conll05 data.
On Fri, Jul 9, 2021 at 6:49 PM Adam @.***> wrote:
Hi!
I noticed in make-wsj-test.sh and make-brown-test.sh that we try to zcat a props, null, and ne file from test.wsj. However, in the extract_test_from_ptb.sh and extract_test_from_brown.sh scripts, none of these dirs/files are generated. Where are these supposed to come from?
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/strubell/preprocess-conll05/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY5TNYMBDKIEH4SM47TE6TTW54HZANCNFSM5ADRYS7A .
Thanks for the response!
I am probably missing something, but I thought the train directory only had data for sections 02-21 for wsj, whereas the test set is for sections 23. To be sure, I am referencing e.g. this line: https://github.com/strubell/preprocess-conll05/blob/master/bin/basic/make-wsj-test.sh#L13 - whereas https://github.com/strubell/preprocess-conll05/blob/master/bin/basic/extract_test_from_ptb.sh only generates words/syntax for section 23.
Hi!
I noticed in
make-wsj-test.shandmake-brown-test.shthat we try to zcat aprops,null, andnefile fromtest.wsj. However, in theextract_test_from_ptb.shandextract_test_from_brown.shscripts, none of these dirs/files are generated. Where are these supposed to come from?Thanks!
Hello, I have the same problem with you. Do you have any ideas now? Thanks!
It sounds like you're describing the ptb training data, not the conll data - the directory I'm referring to is the $CONLL05 dir as defined in get_data.sh.
Yeah I guess so. I am asking about the test data in particular. Which appears to be section 23 of PTB.
So running ./bin/basic/extract_test_from_ptb.sh only extracts words and synts from section 23.
However, bin/basic/make-wsj-test.sh expects props, null, and ne as well. I think for the train/dev data, these dirs come from the conll05 releaser, in get_data.sh, however, section 23 (the test data) does not seem to be included in here.
But for the test data, where do these dirs come from? In bin/basic/make-wsj-test.sh:
zcat < $CONLL05/$FILE/words/$FILE.words.gz > /tmp/$$.words
zcat < $CONLL05/$FILE/props/$FILE.props.gz > /tmp/$$.props
zcat < $CONLL05/$FILE/synt/$FILE.$s.synt.gz > /tmp/$$.synt
# no senses, set to null
zcat < $CONLL05/$FILE/null/$FILE.null.gz > /tmp/$$.senses
zcat < $CONLL05/$FILE/ne/$FILE.ne.gz > /tmp/$$.ne
cannot find the props, sense, or ne file, and then writes an empty archive.
Oh, that's so strange! I guess the senses/ne lines (and corresponding entries in the paste) should be removed, but I'm surprised this non-working version is in the repo. Unfortunately I no longer have access to the old server where I originally developed/ran these scripts, so I can't go back and see if there were uncommitted changes, etc.