biobert icon indicating copy to clipboard operation
biobert copied to clipboard

_read_data function skipping the last sentence

Open ardakdemir opened this issue 6 years ago • 0 comments

Dear authors,

I was going over the code to prepare the input sentences for fine-tuning NER. The _read_data function I think has a small bug which makes the last sentence not added to the lines array. Here is the output for the first two sentences of the BC2GM train split.

[['O O O O O B I I O O O O O O O O B I I O O O O O O O O O O O', 'Immunohistochemical staining was positive for S - 100 in all 9 cases stained , positive for HMB - 45 in 9 ( 90 % ) of 10 , and negative'], ['O B O O O O O O O O O O O O O O O O', 'for cytokeratin in all 9 cases in which myxoid melanoma remained in the block after previous sections .']]

Expected output after fixing the bug:

[['O O O O O B I I O O O O O O O O B I I O O O O O O O O O O O', 'Immunohistochemical staining was positive for S - 100 in all 9 cases stained , positive for HMB - 45 in 9 ( 90 % ) of 10 , and negative'], ['O B O O O O O O O O O O O O O O O O', 'for cytokeratin in all 9 cases in which myxoid melanoma remained in the block after previous sections .'], ['B I O O O O O B O O O O O O O B I O O O B I I O O O O O O O', 'Chloramphenicol acetyltransferase assays examining the ability of IE86 to repress activity from the HCMV major IE promoter or activate the HCMV early promoter for the 2 . 2 - kb'], ['O O O O O O O O O B I O', 'class of RNAs demonstrated the functional integrity of the IE86 protein .']]

ardakdemir avatar Jan 26 '20 04:01 ardakdemir