biomedical
biomedical copied to clipboard
psytar schema is not implemented correctly
In [3]: dsd = load_dataset('bigbio/biodatasets/psytar/psytar.py', name='psytar_bigbio_text', data_dir='/home/galtay/data/
...: bigbio/psytar/PsyTAR_dataset.xlsx')
Using custom data configuration psytar_bigbio_text-7247dd615c830efa
Reusing dataset psy_tar_dataset (/home/galtay/.cache/huggingface/datasets/psy_tar_dataset/psytar_bigbio_text-7247dd615c830efa/1.0.0/149b2465b2445f8a388bc2f7af48f0d136d246f718f59743564f154ea3c2dfbf)
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1193.94it/s]
In [4]: dsd['train']
Out[4]:
Dataset({
features: ['id', 'document_id', 'text', 'labels'],
num_rows: 6003
})
In [5]: dsd['train'][0]
Out[5]:
{'id': '0',
'document_id': 'lexapro.1_1',
'text': "['ADR']",
'labels': ['s', 's', 'r', 'i']}
text should not be a stringified list and labels should not be a list of single letters.