Gabriel Altay comments

Results 17 comments of


                                            Gabriel Altay

WIP script to process a static json metadata file

thanks, this is great as a way of flattening the json metadata file. I've added a "one row per split" set of tabular files in the latest version of this...

bc5cdr drug/chemical counts dont match paper

@jason-fries thanks for taking a look and the link! This seems like a nice well defined task we can ask a volunteer to handle yea? Just wondering if we should...

Create dataset loader for BioASQ Task B (2014-2021)

I get a few globally_unique_id test fails on my end too when running the tests now

JNLPBA implementation issues -- missing passages / entity only implementation

this is an example of a bigbio dataset loader that attempted to start with the existing huggingface datasets implementation and then modify it. there was a full discussion in the...

Add _PUBMED to new datasets

I think these are handled in other PRs, are there changes you want to keep from this PR @sg-wbi ? If not, i think we can close this.

Multiple kb entities are created when one would do b/c entity type is not a list

thanks for this @CallumMcMahon ! good writeup of the problem and indeed, agree that we should have made `type` a list in the kb schema. the decision to make multiple...

Default config override

putting some discussion / comments here. this problem concerns the ability to set a default config when extra kwargs are passed to `load_dataset`. it seems we have 3 choices on...

Multiple normalizations per entity

Thanks for creating this issue and the examples @phlobo . My initial reaction is leaning towards using a single entity with multiple normalizations in all cases. @jason-fries @leonweber @shamikbose @sg-wbi...

Multiple normalizations per entity

@phlobo to add a little more context, we didn't have a hard rule for "what does a list of normalizations mean", we just had datasets that had 0 or more...

model_max_length arg has no effect when creating bert tokenizer

``` In [7]: transformers.__version__ Out[7]: '4.39.0.dev0' In [3]: nt = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased", model_max_length=8192) In [4]: ot = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192) In [5]: nt.model_max_length Out[5]: 512 In [6]: ot.model_max_length Out[6]: 8192 ```