L. Grobol

Results 24 issues of L. Grobol

There currently two ways of dynamically batching tokenized sentences with padding 1. Store them in `List[str]` form, which is not very satisfying because it requires encoding before batching (potential bottleneck...

## Description `jupyter_server` version 2.11.0 broke `JupytextContentsManager`, making jupytext unusable. ## Reproduce 1. Install both `jupytext` and `jupyter_server==2.11.0` 2. Start jupyterlab `jupyter lab` (the issue also occurs with notebook though)...

As of now, the txt parser reads files in text mode as UTF-8 and fails with other encodings. This makes it return a bytes object, leaving the base `decode` to...

I also put a few `python -m pip` instead of `pip` so no matter how the environment is set up, the install will always happen for the right python. Fix...

The current build instructions for sentencepiece are not suitable if you don't have admin rights to install shared libraries. A more robust approach (as taken by https://github.com/google/sentencepiece/blob/master/python/build_bundled.sh) is to disable...

## Description Automatically force a post-run cache save if packages have been updated. ## Justification > Note: Restored cache will not be used if the requirements.txt file is not updated...

feature request

As of now we only offer to train monolingual tokenizers, which don't work for mBART.

enhancement

It seems like a very nice implementation: https://github.com/explosion/curated-transformers

Should be straightforward with [PEFT](https://github.com/huggingface/peft)

Have a look at

test