Gabriel Altay
Gabriel Altay
thanks, this is great as a way of flattening the json metadata file. I've added a "one row per split" set of tabular files in the latest version of this...
@jason-fries thanks for taking a look and the link! This seems like a nice well defined task we can ask a volunteer to handle yea? Just wondering if we should...
I get a few globally_unique_id test fails on my end too when running the tests now
this is an example of a bigbio dataset loader that attempted to start with the existing huggingface datasets implementation and then modify it. there was a full discussion in the...
I think these are handled in other PRs, are there changes you want to keep from this PR @sg-wbi ? If not, i think we can close this.
thanks for this @CallumMcMahon ! good writeup of the problem and indeed, agree that we should have made `type` a list in the kb schema. the decision to make multiple...
putting some discussion / comments here. this problem concerns the ability to set a default config when extra kwargs are passed to `load_dataset`. it seems we have 3 choices on...
Thanks for creating this issue and the examples @phlobo . My initial reaction is leaning towards using a single entity with multiple normalizations in all cases. @jason-fries @leonweber @shamikbose @sg-wbi...
@phlobo to add a little more context, we didn't have a hard rule for "what does a list of normalizations mean", we just had datasets that had 0 or more...
``` In [7]: transformers.__version__ Out[7]: '4.39.0.dev0' In [3]: nt = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased", model_max_length=8192) In [4]: ot = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192) In [5]: nt.model_max_length Out[5]: 512 In [6]: ot.model_max_length Out[6]: 8192 ```