model_max_length arg has no effect when creating bert tokenizer
System Info
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
-
transformersversion: 4.37.2 - Platform: macOS-14.2.1-arm64-arm-64bit
- Python version: 3.10.13
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help?
@ArthurZucker
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
from transformers import AutoTokenizer
new_tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-uncased', model_max_length=8192)
print(new_tokenizer.model_max_length)
# 8192
old_tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
print(old_tokenizer.model_max_length)
# 512
Expected behavior
print(old_tokenizer.model_max_length)
# 8192
Hi @galtay, thanks for raising this issue!
It looks related to #29050
cc @LysandreJik
In [7]: transformers.__version__
Out[7]: '4.39.0.dev0'
In [3]: nt = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased", model_max_length=8192)
In [4]: ot = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
In [5]: nt.model_max_length
Out[5]: 512
In [6]: ot.model_max_length
Out[6]: 8192
Gentle ping @LysandreJik @ArthurZucker
This is now fixed on main! It took a bit of time to go through the deprecation cycle, but it's live.
Thanks for the report @galtay!