Issue Loading 4-bit and 8-bit language models: ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
System Info
System Info
I'm running into an issue where I'm not able to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I'm wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th.
-
transformersversion: 4.31.0.dev0 - Platform: Linux-5.15.107+-x86_64-with-glibc2.31
- Python version: 3.10.12
- Huggingface_hub version: 0.15.1
- Safetensors version: 0.3.1
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Tensorflow version (GPU?): 2.12.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.6.11 (gpu)
- Jax version: 0.4.10
- JaxLib version: 0.4.10
Who can help?
@ArthurZucker @younesbelkada @sgugger
Information
- [X] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Running in Colab on an A100 in Colab PRro
!pip install git+https://www.github.com/huggingface/transformers
!pip install git+https://github.com/huggingface/accelerate
!pip install bitsandbytes
!pip install einops
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch
model_path="tiiuae/falcon-40b-instruct"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")
input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, max_length=100)
print(tokenizer.decode(outputs[0]))
Cell output:
Collecting git+https://www.github.com/huggingface/transformers
Cloning https://www.github.com/huggingface/transformers to /tmp/pip-req-build-6pyatvel
Running command git clone --filter=blob:none --quiet https://www.github.com/huggingface/transformers /tmp/pip-req-build-6pyatvel
warning: redirecting to https://github.com/huggingface/transformers.git/
Resolved https://www.github.com/huggingface/transformers to commit e84bf1f734f87aa2bedc41b9b9933d00fc6add98
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (3.12.2)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0.dev0)
Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.8/236.8 kB 11.6 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (23.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2.27.1)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0.dev0)
Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 114.2 MB/s eta 0:00:00
Collecting safetensors>=0.3.1 (from transformers==4.31.0.dev0)
Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 79.9 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (4.65.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (2023.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (4.6.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (3.4)
Building wheels for collected packages: transformers
Building wheel for transformers (pyproject.toml) ... done
Created wheel for transformers: filename=transformers-4.31.0.dev0-py3-none-any.whl size=7228417 sha256=5867afa880111a40f7b630e51d9f1709ec1131236a31c2c7fb5f97179e3d1405
Stored in directory: /tmp/pip-ephem-wheel-cache-t06u3u6x/wheels/c1/ac/11/e69d454307e735e14f4f95e575c8be27fd99835ec36f504c13
Successfully built transformers
Installing collected packages: tokenizers, safetensors, huggingface-hub, transformers
Successfully installed huggingface-hub-0.15.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.31.0.dev0
Collecting git+https://github.com/huggingface/accelerate
Cloning https://github.com/huggingface/accelerate to /tmp/pip-req-build-76ziff6x
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/accelerate /tmp/pip-req-build-76ziff6x
Resolved https://github.com/huggingface/accelerate to commit d141b4ce794227450a105b7281611c7980e5b3d6
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (23.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (6.0)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (2.0.1+cu118)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (4.6.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (16.0.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->accelerate==0.21.0.dev0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->accelerate==0.21.0.dev0) (1.3.0)
Building wheels for collected packages: accelerate
Building wheel for accelerate (pyproject.toml) ... done
Created wheel for accelerate: filename=accelerate-0.21.0.dev0-py3-none-any.whl size=234648 sha256=71b98a6d4b1111cc9ca22265f6699cd552325e5f71c83daebe696afd957497ee
Stored in directory: /tmp/pip-ephem-wheel-cache-atmtszgr/wheels/f6/c7/9d/1b8a5ca8353d9307733bc719107acb67acdc95063bba749f26
Successfully built accelerate
Installing collected packages: accelerate
Successfully installed accelerate-0.21.0.dev0
Collecting bitsandbytes
Downloading bitsandbytes-0.39.1-py3-none-any.whl (97.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.1/97.1 MB 18.8 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.39.1
Collecting einops
Downloading einops-0.6.1-py3-none-any.whl (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 3.8 MB/s eta 0:00:00
Installing collected packages: einops
Successfully installed einops-0.6.1
Downloading (…)lve/main/config.json: 100%
658/658 [00:00<00:00, 51.8kB/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 227kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 3.76MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 3.46MB/s]
Downloading shards: 100%
9/9 [04:40<00:00, 29.33s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:37<00:00, 274MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 340MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:28<00:00, 320MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 317MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 210MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:34<00:00, 180MB/s]
Downloading (…)l-00007-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 307MB/s]
Downloading (…)l-00008-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 504MB/s]
Downloading (…)l-00009-of-00009.bin: 100%
7.58G/7.58G [00:27<00:00, 315MB/s]
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-b20acq94qsrp --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
Loading checkpoint shards: 100%
9/9 [05:45<00:00, 35.83s/it]
Downloading (…)neration_config.json: 100%
111/111 [00:00<00:00, 10.3kB/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-1-c89997e10ae9>](https://localhost:8080/#) in <cell line: 15>()
13
14 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
---> 15 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
16
17 tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")
3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs)
1894 # Checks if the model has been loaded in 8-bit
1895 if getattr(self, "is_quantized", False):
-> 1896 raise ValueError(
1897 "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the"
1898 " model has already been set to the correct devices and casted to the correct `dtype`."
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Expected behavior
Model should be loaded and able to run inference.
Hi @DJT777 Thanks for the report Are you using the main branch of accelerate + single GPU? If that's the case https://github.com/huggingface/accelerate/pull/1652 should solve the issue. Will try to reproduce later without that fix
I wasn't able to test it using that commit. However running everything with the versioning from my June 8th run got the model loaded back up again. I am using this to run the notebook:
!pip install git+https://www.github.com/huggingface/transformers@2e2088f24b60d8817c74c32a0ac6bb1c5d39544d !pip install huggingface-hub==0.15.1 !pip install tokenizers==0.13.3 !pip install safetensors==0.3.1 !pip install git+https://github.com/huggingface/accelerate@040f178569fbfe7ab7113af709dc5a7fa09e95bd !pip install bitsandbytes==0.39.0 !pip install einops==0.6.1
Thanks @DJT777
Can you try with pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8 ?
@younesbelkada
I'll have an attempt at running things again with that.
Great thanks!
I went for
!pip install git+https://github.com/huggingface/transformers.git@6ce6d62b6f20040129ec9831e7c4f6576402ea42 !pip install git+https://github.com/huggingface/accelerate.git@5791d949ff93733c102461ba89c8310745a3fa79 !pip install git+https://github.com/huggingface/peft.git@e2b8e3260d3eeb736edf21a2424e89fe3ecf429d !pip install transformers[deepspeed] I had to include transformers[deepspeed] yesterday, and earlier today I had to cherrypick commits to make things work..
Development is going so fast, hard to keep up with every change 😅
Hi @DJT777 I just ran the script below:
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch
model_path="tiiuae/falcon-40b-instruct"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")
input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0]))
and transformers' main branch & the fix-to-int8 branch of accelerate and I can confirm the script worked fine. I am running on 2x NVIDIA T4 16GB
@younesbelkada
I'm not able to confirm if it is working in Colab.
I get the same error in Google Colab ("ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype."), things were working perfectly well yesterday... Copy-pasting this code in a Colab notebook cell and running it might allow for the reproduction of that error:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install -q einops
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})
Notebook settings/runtime type are/is:
- Runtime type = Python 3
- GPU = T4
Hi @Maaalik I can confirm the PR mentioned above on accelerate fixes your issue on GColab, can you try on a new runtime / fresh environment:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git@fix-to-int8
!pip install -q datasets
!pip install -q einops
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})
I just tested it on GColab
Works like a charm! Thank you very much, @younesbelkada!
https://github.com/huggingface/accelerate/pull/1652 being merged you can now install accelerate from source and it should work
@younesbelkada All the test case above is using device_map="auto", it also works for me. BUT: if I use device_map={'':torch.cuda.current_device()}, the error shows again like:
Traceback (most recent call last):
File "train1.py", line 124, in <module>
trainer = SFTTrainer(
File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
super().__init__(
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
self._move_model_to_device(model, args.device)
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
model = model.to(device)
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
@younesbelkada Even if set device_map="auto", if only have 1 GPU, still facing the error:
Traceback (most recent call last):
File "train1.py", line 124, in <module>
trainer = SFTTrainer(
File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
super().__init__(
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
self._move_model_to_device(model, args.device)
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
model = model.to(device)
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`
@sgugger Sorry another question here =) as above
I do not have the answer, no need to tag me.
hi @Andcircle
Do you face the same issue with the main branch of transformers?
pip install -U git+https://github.com/huggingface/transformers.git
hi @Andcircle Do you face the same issue with the
mainbranch of transformers?pip install -U git+https://github.com/huggingface/transformers.git
Hi @younesbelkada,
Once I changed to 4.32.0.dev0, the error "ValueError: .to is not supported for 4-bit or 8-bit models." is gone. But got new error:
ValueError: weight is on the meta device, we need a `value` to put in on 0. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 14627) of binary: /usr/bin/python3
I load the llama2 7b model like this, then wanna use SFT trainer
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
# load_in_8bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=bnb_config, trust_remote_code=True,
low_cpu_mem_usage=False,
# device_map={'':torch.cuda.current_device()}
)
@younesbelkada
If I switch pip install -U git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9, there's no "weight is on the meta device" issue, but it has "ValueError: .to is not supported for 4-bit or 8-bit models" issue for full fine tuning without lora.
Thanks @DJT777 Can you try with
pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8?Using https://github.com/huggingface/accelerate@d1628ee, didn't solve.
WARNING: Did not find branch or tag 'fix-to-int8', assuming revision or ref. Running command git checkout -q fix-to-int8 error: pathspec 'fix-to-int8' did not match any file(s) known to git error: subprocess-exited-with-error
× git checkout -q fix-to-int8 did not run successfully. │ exit code: 1 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error
× git checkout -q fix-to-int8 did not run successfully. │ exit code: 1 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
hi @MrKsiJ You can now use with accelerate main branch
pip install -U git+https://github.com/huggingface/accelerate.git
hi @MrKsiJ You can now use with accelerate main branch
pip install -U git+https://github.com/huggingface/accelerate.git
the problem is solved, we are moving to another place, but now I have another question how to run peftmodel.from_trained locally without the Internet, if you disable the Internet, then peftmodel.from_trained for some reason still breaks on humbleface, although everything is downloaded at the first launch
Which version of accelerate and transformer fix this issue? I am using transformers==4.36.2 and accelerate==0.26.1, and I am still having this error @younesbelkada. The issue still exists if I use transformers==4.38.0 and accelerate==0.27.2.
The stacktrace is
2024-02-23 11:26:16,461 ERROR tune_controller.py:1374 -- Trial task failed for trial TorchTrainer_22c6b_00000
Traceback (most recent call last):
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Inner.train() (pid=3455, ip=10.68.12.214, actor_id=3ae8e14d20959e0bb1e7fd5c0c000000, repr=TorchTrainer)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 342, in train
raise skipped from exception_cause(skipped)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 43, in check_for_failure
ray.get(object_ref)
ray.exceptions.RayTaskError(ValueError): ray::_RayTrainWorker__execute.get_next() (pid=6302, ip=10.68.23.157, actor_id=77895994777a3819337ced8b0c000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f83c6ae1720>)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
raise skipped from exception_cause(skipped)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 118, in discard_return_wrapper
train_func(*args, **kwargs)
File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/working_dir_files/_ray_pkg_0639d2a5677b0f10/llama_ray_2.9.py", line 138, in train_func
model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/__init__.py", line 176, in initialize
engine = DeepSpeedEngine(args=args,
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
self._configure_distributed_model(model)
File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1113, in _configure_distributed_model
self.module.to(self.device)
File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 448, in wrapper
return fn(*args, **kwargs)
File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in to
raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
any updates on this? I am also facing this issue
same.. Why can't user choose which device models should be set? I want to do distillation and want to move the teacher model to cuda:1 : ( @younesbelkada ALL HF package has been updated to newest include transformers peft accelerate and bitsandbytes
Only using bitsandbytes will occur this problem.
Reopening as it appears the issue is still occurring cc @SunMarc
same.. Why can't user choose which device models should be set? I want to do distillation and want to move the teacher model to cuda:1 : ( @younesbelkada ALL HF package has been updated to newest include transformers peft accelerate and bitsandbytes
Hey @rangehow, thanks for the feedback. In the past, it was not possible to move the quantized model due to some issues. But I think the issue is solved now with the latest bnb. We just need to update transformers + test a bit. Can you have a look @matthewdouglas ? Also, you can load the model to the desired device (e.g. "cuda:1") by setting device_map = {"":"cuda:1"} when you load the model.
Hey @rangehow, thanks for the feedback. In the past, it was not possible to move the quantized model due to some issues. But I think the issue is solved now with the latest bnb. We just need to update transformers + test a bit. Can you have a look @matthewdouglas ? Also, you can load the model to the desired device (e.g. "cuda:1") by setting device_map = {"":"cuda:1"} when you load the model.
Yes it would help :)
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16
)
teacher_model = AutoModelForCausalLM.from_pretrained(
teacher_model_dir,
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16,
device_map="cuda:2",
quantization_config=quantization_config,
)
any updates? it is not working for me...
@rangehow @axelblaze88, is the issue fixed ? We merged a PR to allow moving quantized model. Make sure to install the latest transformers + bnb.