biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

Closes #242

Open alisoncallahan opened this issue 3 years ago • 4 comments

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

If the following information is NOT present in the issue, please populate:

  • Name: TAC 2017
  • Description: This dataset is designed for extraction of ADRs from prescription drug labels.
  • Paper: https://tac.nist.gov/publications/2017/additional.papers/TAC2017.ADR_overview.proceedings.pdf
  • Data: https://bionlp.nlm.nih.gov/tac2017adversereactions/train_xml.tar.gz

Checkbox

  • [x] Confirm that this PR is linked to the dataset issue.
  • [x] Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • [x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
  • [x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • [x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
  • [x] Confirm dataloader script works with datasets.load_dataset function.
  • [ ] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
  • [ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

alisoncallahan avatar Apr 20 '22 22:04 alisoncallahan

@alisoncallahan let us know if you need help with this!

hakunanatasha avatar Apr 27 '22 15:04 hakunanatasha

thanks @hakunanatasha, this is in progress. no blockers for now, except lack of time ;-) I'm working on it today and tomorrow.

alisoncallahan avatar Apr 28 '22 21:04 alisoncallahan

@jason-fries @hakunanatasha committed what I have so far. KB schema tests are failing because there is still weirdness in reconciling offsets with source text.

alisoncallahan avatar Apr 29 '22 22:04 alisoncallahan

hey, @alisoncallahan thanks for helping us with this dataset! I can confirm that both load_dataset and unit-test on source config are working. However, I did encounter the same failures when compiling against the tac2017_bigbio_kb config. Additionally, you could run the command below:

python -m tests.test_bigbio_by_name biodatasets/tac2017/tac2017.py tac2017_bigbio_kb

This seems to log more details than the other test command. Hope it helps. Please don't hesitate to reach out if you have further questions!

ruisi-su avatar Apr 30 '22 18:04 ruisi-su