chemprot dataset problem with run_re.py script
Hi, I would like to know how to run the run_re.py script for chemprot dataset. CHEMPROT dataset is a multi-class classification dataset, I simply run the run_re.py script with the same format like "gad" task, but error happened with no train.tsv file found.
I attached the chemprot dataset file below. Could you please help me out? ChemProt_Corpus.zip
thanks.
The ChemProt dataset format is different with other RE dataset, basically the normal RE dataset like eduar and GAD has three tsv files: train.tsv, test.tsv and dev.tsv.
But for ChemProt training dataset, there are several files: chemprot_training_abstracts.tsv, chemprot_training_entities.tsv, chemprot_training_gold_standard.tsv, chemprot_training_relations.tsv, so how to use run_re.py in BioBert for ChemProt dataset?
Thank you for asking that question @wangxinyi-gsafety,
I have tried many different preprocessing of ChemProt, combining the datasets to stick to the format used by GAD dataset and others. The script is functionnal but the results are quite poor and I strongly suspect the preprocessing to be the cause of such poor results (I think it alters to much the nature of the original data).
How did you manage to preprocess the data to conserve enough of the information so the model performs well on it ?
Best regards, Arthur
Anybody found a way to run chemprot RE?