Closes #242
Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.
If the following information is NOT present in the issue, please populate:
- Name: TAC 2017
- Description: This dataset is designed for extraction of ADRs from prescription drug labels.
- Paper: https://tac.nist.gov/publications/2017/additional.papers/TAC2017.ADR_overview.proceedings.pdf
- Data: https://bionlp.nlm.nih.gov/tac2017adversereactions/train_xml.tar.gz
Checkbox
- [x] Confirm that this PR is linked to the dataset issue.
- [x] Create the dataloader script
biodatasets/my_dataset/my_dataset.py(please use only lowercase and underscore for dataset naming). - [x] Provide values for the
_CITATION,_DATASETNAME,_DESCRIPTION,_HOMEPAGE,_LICENSE,_URLs,_SUPPORTED_TASKS,_SOURCE_VERSION, and_BIGBIO_VERSIONvariables. - [x] Implement
_info(),_split_generators()and_generate_examples()in dataloader script. - [x] Make sure that the
BUILDER_CONFIGSclass attribute is a list with at least oneBigBioConfigfor the source schema and one for a bigbio schema. - [x] Confirm dataloader script works with
datasets.load_datasetfunction. - [ ] Confirm that your dataloader script passes the test suite run with
python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py. - [ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.
@alisoncallahan let us know if you need help with this!
thanks @hakunanatasha, this is in progress. no blockers for now, except lack of time ;-) I'm working on it today and tomorrow.
@jason-fries @hakunanatasha committed what I have so far. KB schema tests are failing because there is still weirdness in reconciling offsets with source text.
hey, @alisoncallahan thanks for helping us with this dataset! I can confirm that both load_dataset and unit-test on source config are working. However, I did encounter the same failures when compiling against the tac2017_bigbio_kb config. Additionally, you could run the command below:
python -m tests.test_bigbio_by_name biodatasets/tac2017/tac2017.py tac2017_bigbio_kb
This seems to log more details than the other test command. Hope it helps. Please don't hesitate to reach out if you have further questions!