Lucile Saulnier issues

Results 29 issues of


                                            Lucile Saulnier

[TBD] discrepancy regarding the tokenize method behavior - should the token correspond to the token in the vocabulary or to the initial text

## Environment info ``` - `transformers` version: 4.17.0 - Platform: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic - Python version: 3.7.12 - PyTorch version (GPU?): 1.10.0+cu111 (False) - Tensorflow version (GPU?): 2.8.0 (False) - Flax version...

Issue with offline mode

## Describe the bug I can't retrieve a cached dataset with offline mode enabled ## Steps to reproduce the bug To reproduce my issue, first, you'll need to run a...

bug

Add a `Sequence` to the processors

I think it could be useful to have a `Sequence` object for (post)-processors. Indeed, I think a tokenizer might need to combine the `ByteLevel` post-processor and a `TemplateProcessing` processor. Today...

enhancement

add Zenodo DOI Badge

Proposal to add the Zenodo DOI Badge in the README. The DOI used here corresponds to the Concept DOI (versus the Version DOIs), which represents the concept of the software...

[TBD] add a feature to continue training a tokenizer

I seem to have seen this request more than once on `transformers`, many users would like to be able to continue training a tokenizer on a new dataset (see for...

Question about offsets returned for text starting with a space when using `add_prefix_space=True` and `trim_offsets=True`

# Question When creating a tokenizer with `add_prefix_space=True` and `trim_offsets=True` (for example with `ByteLevel` or `RobertaProcessing`), the offsets returned on a text starting with a text are not what I...

Lucile Saulnier

[TBD] discrepancy regarding the tokenize method behavior - should the token correspond to the token in the vocabulary or to the initial text

Issue with offline mode

Add a `Sequence` to the processors

add Zenodo DOI Badge

[TBD] add a feature to continue training a tokenizer

Question about offsets returned for text starting with a space when using `add_prefix_space=True` and `trim_offsets=True`

Add extrapolation experiment slurm scripts

[new feature] add a `tag` to the arguments to load the checkpoint from a specific step (not necessarly the latest)

Finding ways to make the code more efficient

Gain performance by reducing multiple_search_dict calls