transformers model_max_length arg has no effect when creating bert tokenizer

System Info

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.37.2
Platform: macOS-14.2.1-arm64-arm-64bit
Python version: 3.10.13
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

@ArthurZucker

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer
new_tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-uncased', model_max_length=8192)
print(new_tokenizer.model_max_length)
# 8192
old_tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
print(old_tokenizer.model_max_length)
# 512

Expected behavior

print(old_tokenizer.model_max_length)
# 8192

Feb 16 '24 06:02 galtay

Hi @galtay, thanks for raising this issue!

It looks related to #29050

cc @LysandreJik

Feb 16 '24 12:02 amyeroberts

In [7]: transformers.__version__
Out[7]: '4.39.0.dev0'

In [3]: nt = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased", model_max_length=8192)
In [4]: ot = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)

In [5]: nt.model_max_length
Out[5]: 512

In [6]: ot.model_max_length
Out[6]: 8192

Mar 17 '24 16:03 galtay

Gentle ping @LysandreJik @ArthurZucker

Apr 10 '24 13:04 amyeroberts

This is now fixed on main! It took a bit of time to go through the deprecation cycle, but it's live.

Thanks for the report @galtay!

May 09 '24 15:05 LysandreJik