cannot find the file "chosen_classes.txt"

Open yhifny opened this issue 2 years ago • 1 comments

I am trying to add additional entities without retraining. The script preprocess_all.py fails: Ran out of 'useful' classes to select. So using number the 153 chosen classes. Note that this is not expected to happen. It likely indicates that the Wikidata dump or Wikipedia was dump was not downloaded and parsed correctly. Traceback (most recent call last): File "/mnt/nlu/users/yasser_hifny/gkqa/refined/ReFinED/src/refined/offline_data_generation/preprocess_all.py", line 364, in <module> main() File "/mnt/nlu/users/yasser_hifny/gkqa/refined/ReFinED/src/refined/offline_data_generation/preprocess_all.py", line 244, in main select_classes(resources_dir=OUTPUT_PATH, is_test=debug) File "/mnt/nlu/users/yasser_hifny/gkqa/refined/ReFinED/src/refined/offline_data_generation/class_selection.py", line 152, in select_classes os.rename(os.path.join(resources_dir, 'chosen_classes.txt.part'), FileNotFoundError: [Errno 2] No such file or directory: 'data/chosen_classes.txt.part' -> 'data/chosen_classes.txt'

I am not able to find the file "chosen_classes.txt" in the original data folder:

`additional_data:

datasets:

roberta-base: config.json merges.txt pytorch_model.bin vocab.json

wikipedia_data: class_to_idx.json descriptions_tns.pt nltk_sentence_splitter_english.pickle qcode_to_class_tns_6269457-138.np qcode_to_wiki.lmdb class_to_label.json human_qcodes.json pem.lmdb qcode_to_idx.lmdb subclasses.lmdb

wikipedia_model: config.json model.pt

wikipedia_model_with_numbers: config.json model.pt `

how can I find it and thanks in advance.

Sep 07 '23 09:09 yhifny

On the class_selection.py#L152 file, you can add a condition to check if the chosen_classes.txt.part exists. If not, then you can directly create the chosen_classes.txt and save the classes.

if os.path.exists(os.path.join(resources_dir, 'chosen_classes.txt.part')):
    os.rename(os.path.join(resources_dir, 'chosen_classes.txt.part'),
             os.path.join(resources_dir, 'chosen_classes.txt'))
else:
    with open(os.path.join(resources_dir, 'chosen_classes.txt'), 'w') as f_out:
        f_out.write('\n'.join([x for x in chosen_classes]))
return

Another solution might be to comment out this early exit condition and wait until it finishes all the pages. This would probably take forever so I wouldn't recommend it.

Oct 17 '23 12:10 endrikacupaj