L. Grobol issues

Results 24 issues of


                                            L. Grobol

Support for `pad_encodings` in the Python API

There currently two ways of dynamically batching tokenized sentences with padding 1. Store them in `List[str]` form, which is not very satisfying because it requires encoding before batching (potential bottleneck...

Incompatibility with jupyter_server 2.11.0

## Description `jupyter_server` version 2.11.0 broke `JupytextContentsManager`, making jupytext unusable. ## Reproduce 1. Install both `jupytext` and `jupyter_server==2.11.0` 2. Start jupyterlab `jupyter lab` (the issue also occurs with notebook though)...

Enable encoding detection for the txt parser

As of now, the txt parser reads files in text mode as UTF-8 and fails with other encodings. This makes it return a bytes object, leaving the base `decode` to...

Use non-root install for sentencepiece

I also put a few `python -m pip` instead of `pip` so no matter how the environment is set up, the install will always happen for the right python. Fix...

Disable shared sentencepiece libraries in installation instructions

The current build instructions for sentencepiece are not suitable if you don't have admin rights to install shared libraries. A more robust approach (as taken by https://github.com/google/sentencepiece/blob/master/python/build_bundled.sh) is to disable...

Save cache if packages have been updated

## Description Automatically force a post-run cache save if packages have been updated. ## Justification > Note: Restored cache will not be used if the requirements.txt file is not updated...

feature request

Add multiling tokenizer training for mBART

As of now we only offer to train monolingual tokenizers, which don't work for mBART.

enhancement

Keep an eye on curated-transformers

It seems like a very nice implementation: https://github.com/explosion/curated-transformers

add LORA fine-tuning

Should be straightforward with [PEFT](https://github.com/huggingface/peft)

Add reproducibility testing

Have a look at

test