transformers Issue Loading 4-bit and 8-bit language models: ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

System Info

I'm running into an issue where I'm not able to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I'm wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th.

transformers version: 4.31.0.dev0
Platform: Linux-5.15.107+-x86_64-with-glibc2.31
Python version: 3.10.12
Huggingface_hub version: 0.15.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1+cu118 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.6.11 (gpu)
Jax version: 0.4.10
JaxLib version: 0.4.10

Who can help?

@ArthurZucker @younesbelkada @sgugger

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Running in Colab on an A100 in Colab PRro



!pip install git+https://www.github.com/huggingface/transformers

!pip install git+https://github.com/huggingface/accelerate

!pip install bitsandbytes

!pip install einops

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="tiiuae/falcon-40b-instruct"

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=100)
print(tokenizer.decode(outputs[0]))

Cell output:

Collecting git+https://www.github.com/huggingface/transformers
  Cloning https://www.github.com/huggingface/transformers to /tmp/pip-req-build-6pyatvel
  Running command git clone --filter=blob:none --quiet https://www.github.com/huggingface/transformers /tmp/pip-req-build-6pyatvel
  warning: redirecting to https://github.com/huggingface/transformers.git/
  Resolved https://www.github.com/huggingface/transformers to commit e84bf1f734f87aa2bedc41b9b9933d00fc6add98
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (3.12.2)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0.dev0)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.8/236.8 kB 11.6 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (23.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2.27.1)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0.dev0)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 114.2 MB/s eta 0:00:00
Collecting safetensors>=0.3.1 (from transformers==4.31.0.dev0)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 79.9 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (4.65.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (2023.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (4.6.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (3.4)
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... done
  Created wheel for transformers: filename=transformers-4.31.0.dev0-py3-none-any.whl size=7228417 sha256=5867afa880111a40f7b630e51d9f1709ec1131236a31c2c7fb5f97179e3d1405
  Stored in directory: /tmp/pip-ephem-wheel-cache-t06u3u6x/wheels/c1/ac/11/e69d454307e735e14f4f95e575c8be27fd99835ec36f504c13
Successfully built transformers
Installing collected packages: tokenizers, safetensors, huggingface-hub, transformers
Successfully installed huggingface-hub-0.15.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.31.0.dev0
Collecting git+https://github.com/huggingface/accelerate
  Cloning https://github.com/huggingface/accelerate to /tmp/pip-req-build-76ziff6x
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/accelerate /tmp/pip-req-build-76ziff6x
  Resolved https://github.com/huggingface/accelerate to commit d141b4ce794227450a105b7281611c7980e5b3d6
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (23.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (6.0)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (2.0.1+cu118)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (4.6.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (16.0.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->accelerate==0.21.0.dev0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->accelerate==0.21.0.dev0) (1.3.0)
Building wheels for collected packages: accelerate
  Building wheel for accelerate (pyproject.toml) ... done
  Created wheel for accelerate: filename=accelerate-0.21.0.dev0-py3-none-any.whl size=234648 sha256=71b98a6d4b1111cc9ca22265f6699cd552325e5f71c83daebe696afd957497ee
  Stored in directory: /tmp/pip-ephem-wheel-cache-atmtszgr/wheels/f6/c7/9d/1b8a5ca8353d9307733bc719107acb67acdc95063bba749f26
Successfully built accelerate
Installing collected packages: accelerate
Successfully installed accelerate-0.21.0.dev0
Collecting bitsandbytes
  Downloading bitsandbytes-0.39.1-py3-none-any.whl (97.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.1/97.1 MB 18.8 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.39.1
Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 3.8 MB/s eta 0:00:00
Installing collected packages: einops
Successfully installed einops-0.6.1
Downloading (…)lve/main/config.json: 100%
658/658 [00:00<00:00, 51.8kB/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 227kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 3.76MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 3.46MB/s]
Downloading shards: 100%
9/9 [04:40<00:00, 29.33s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:37<00:00, 274MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 340MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:28<00:00, 320MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 317MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 210MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:34<00:00, 180MB/s]
Downloading (…)l-00007-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 307MB/s]
Downloading (…)l-00008-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 504MB/s]
Downloading (…)l-00009-of-00009.bin: 100%
7.58G/7.58G [00:27<00:00, 315MB/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-b20acq94qsrp --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
Loading checkpoint shards: 100%
9/9 [05:45<00:00, 35.83s/it]
Downloading (…)neration_config.json: 100%
111/111 [00:00<00:00, 10.3kB/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-1-c89997e10ae9>](https://localhost:8080/#) in <cell line: 15>()
     13 
     14 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
---> 15 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
     16 
     17 tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs)
   1894         # Checks if the model has been loaded in 8-bit
   1895         if getattr(self, "is_quantized", False):
-> 1896             raise ValueError(
   1897                 "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the"
   1898                 " model has already been set to the correct devices and casted to the correct `dtype`."

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Expected behavior

Model should be loaded and able to run inference.

Jun 28 '23 06:06 DJT777

Hi @DJT777 Thanks for the report Are you using the main branch of accelerate + single GPU? If that's the case https://github.com/huggingface/accelerate/pull/1652 should solve the issue. Will try to reproduce later without that fix

Jun 28 '23 06:06 younesbelkada

I wasn't able to test it using that commit. However running everything with the versioning from my June 8th run got the model loaded back up again. I am using this to run the notebook:

!pip install git+https://www.github.com/huggingface/transformers@2e2088f24b60d8817c74c32a0ac6bb1c5d39544d !pip install huggingface-hub==0.15.1 !pip install tokenizers==0.13.3 !pip install safetensors==0.3.1 !pip install git+https://github.com/huggingface/accelerate@040f178569fbfe7ab7113af709dc5a7fa09e95bd !pip install bitsandbytes==0.39.0 !pip install einops==0.6.1

Jun 28 '23 07:06 DJT777

Thanks @DJT777 Can you try with pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8 ?

Jun 28 '23 07:06 younesbelkada

@younesbelkada

I'll have an attempt at running things again with that.

Jun 28 '23 07:06 DJT777

Great thanks!

Jun 28 '23 07:06 younesbelkada

I went for

!pip install git+https://github.com/huggingface/transformers.git@6ce6d62b6f20040129ec9831e7c4f6576402ea42 !pip install git+https://github.com/huggingface/accelerate.git@5791d949ff93733c102461ba89c8310745a3fa79 !pip install git+https://github.com/huggingface/peft.git@e2b8e3260d3eeb736edf21a2424e89fe3ecf429d !pip install transformers[deepspeed] I had to include transformers[deepspeed] yesterday, and earlier today I had to cherrypick commits to make things work..

Development is going so fast, hard to keep up with every change 😅

Jun 28 '23 07:06 buzzCraft

Hi @DJT777 I just ran the script below:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="tiiuae/falcon-40b-instruct"

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0]))

and transformers' main branch & the fix-to-int8 branch of accelerate and I can confirm the script worked fine. I am running on 2x NVIDIA T4 16GB

Jun 28 '23 08:06 younesbelkada

@younesbelkada

I'm not able to confirm if it is working in Colab.

Jun 28 '23 08:06 DJT777

I get the same error in Google Colab ("ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype."), things were working perfectly well yesterday... Copy-pasting this code in a Colab notebook cell and running it might allow for the reproduction of that error:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install -q einops

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})

Notebook settings/runtime type are/is:

Runtime type = Python 3
GPU = T4

Jun 28 '23 09:06 Maaalik

Hi @Maaalik I can confirm the PR mentioned above on accelerate fixes your issue on GColab, can you try on a new runtime / fresh environment:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git@fix-to-int8
!pip install -q datasets
!pip install -q einops

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})

I just tested it on GColab

Jun 28 '23 09:06 younesbelkada

Works like a charm! Thank you very much, @younesbelkada!

Jun 28 '23 10:06 Maaalik

https://github.com/huggingface/accelerate/pull/1652 being merged you can now install accelerate from source and it should work

Jun 28 '23 14:06 younesbelkada

@younesbelkada All the test case above is using device_map="auto", it also works for me. BUT: if I use device_map={'':torch.cuda.current_device()}, the error shows again like:

Traceback (most recent call last):
  File "train1.py", line 124, in <module>
    trainer = SFTTrainer(
  File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Aug 08 '23 06:08 Andcircle

@younesbelkada Even if set device_map="auto", if only have 1 GPU, still facing the error:

Traceback (most recent call last):
  File "train1.py", line 124, in <module>
    trainer = SFTTrainer(
  File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`

Aug 08 '23 20:08 Andcircle

@sgugger Sorry another question here =) as above

Aug 08 '23 20:08 Andcircle

I do not have the answer, no need to tag me.

Aug 09 '23 06:08 sgugger

hi @Andcircle Do you face the same issue with the main branch of transformers?

pip install -U git+https://github.com/huggingface/transformers.git

Aug 17 '23 09:08 younesbelkada

hi @Andcircle Do you face the same issue with the main branch of transformers?
pip install -U git+https://github.com/huggingface/transformers.git

Hi @younesbelkada,

Once I changed to 4.32.0.dev0, the error "ValueError: .to is not supported for 4-bit or 8-bit models." is gone. But got new error:

ValueError: weight is on the meta device, we need a `value` to put in on 0. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 14627) of binary: /usr/bin/python3

I load the llama2 7b model like this, then wanna use SFT trainer

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        # load_in_8bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=True,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name, quantization_config=bnb_config, trust_remote_code=True, 
        low_cpu_mem_usage=False,
        # device_map={'':torch.cuda.current_device()}
        )

@younesbelkada If I switch pip install -U git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9, there's no "weight is on the meta device" issue, but it has "ValueError: .to is not supported for 4-bit or 8-bit models" issue for full fine tuning without lora.

Aug 17 '23 18:08 Andcircle

Thanks @DJT777 Can you try with pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8 ?

Using https://github.com/huggingface/accelerate@d1628ee, didn't solve.

WARNING: Did not find branch or tag 'fix-to-int8', assuming revision or ref. Running command git checkout -q fix-to-int8 error: pathspec 'fix-to-int8' did not match any file(s) known to git error: subprocess-exited-with-error

× git checkout -q fix-to-int8 did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× git checkout -q fix-to-int8 did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Oct 20 '23 10:10 MrKsiJ

hi @MrKsiJ You can now use with accelerate main branch

pip install -U git+https://github.com/huggingface/accelerate.git

Oct 23 '23 10:10 younesbelkada

hi @MrKsiJ You can now use with accelerate main branch
pip install -U git+https://github.com/huggingface/accelerate.git

the problem is solved, we are moving to another place, but now I have another question how to run peftmodel.from_trained locally without the Internet, if you disable the Internet, then peftmodel.from_trained for some reason still breaks on humbleface, although everything is downloaded at the first launch

Oct 23 '23 11:10 MrKsiJ

Which version of accelerate and transformer fix this issue? I am using transformers==4.36.2 and accelerate==0.26.1, and I am still having this error @younesbelkada. The issue still exists if I use transformers==4.38.0 and accelerate==0.27.2.

The stacktrace is

2024-02-23 11:26:16,461 ERROR tune_controller.py:1374 -- Trial task failed for trial TorchTrainer_22c6b_00000
Traceback (most recent call last):
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Inner.train() (pid=3455, ip=10.68.12.214, actor_id=3ae8e14d20959e0bb1e7fd5c0c000000, repr=TorchTrainer)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 43, in check_for_failure
    ray.get(object_ref)
ray.exceptions.RayTaskError(ValueError): ray::_RayTrainWorker__execute.get_next() (pid=6302, ip=10.68.23.157, actor_id=77895994777a3819337ced8b0c000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f83c6ae1720>)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 118, in discard_return_wrapper
    train_func(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/working_dir_files/_ray_pkg_0639d2a5677b0f10/llama_ray_2.9.py", line 138, in train_func
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/__init__.py", line 176, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
    self._configure_distributed_model(model)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1113, in _configure_distributed_model
    self.module.to(self.device)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 448, in wrapper
    return fn(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Feb 23 '24 16:02 robinsonmhj

any updates on this? I am also facing this issue

Aug 08 '24 19:08 adibMosharrof

same.. Why can't user choose which device models should be set? I want to do distillation and want to move the teacher model to cuda:1 : ( @younesbelkada ALL HF package has been updated to newest include transformers peft accelerate and bitsandbytes

Aug 10 '24 07:08 rangehow

Only using bitsandbytes will occur this problem.

Aug 10 '24 07:08 rangehow

Reopening as it appears the issue is still occurring cc @SunMarc

Aug 12 '24 18:08 amyeroberts

same.. Why can't user choose which device models should be set? I want to do distillation and want to move the teacher model to cuda:1 : ( @younesbelkada ALL HF package has been updated to newest include transformers peft accelerate and bitsandbytes

Hey @rangehow, thanks for the feedback. In the past, it was not possible to move the quantized model due to some issues. But I think the issue is solved now with the latest bnb. We just need to update transformers + test a bit. Can you have a look @matthewdouglas ? Also, you can load the model to the desired device (e.g. "cuda:1") by setting device_map = {"":"cuda:1"} when you load the model.

Aug 13 '24 13:08 SunMarc

Hey @rangehow, thanks for the feedback. In the past, it was not possible to move the quantized model due to some issues. But I think the issue is solved now with the latest bnb. We just need to update transformers + test a bit. Can you have a look @matthewdouglas ? Also, you can load the model to the desired device (e.g. "cuda:1") by setting device_map = {"":"cuda:1"} when you load the model.

Yes it would help :)

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16
)
teacher_model = AutoModelForCausalLM.from_pretrained(
    teacher_model_dir,
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
    device_map="cuda:2",
    quantization_config=quantization_config,
)

Aug 13 '24 13:08 rangehow

any updates? it is not working for me...

Aug 21 '24 22:08 axelblaze88

@rangehow @axelblaze88, is the issue fixed ? We merged a PR to allow moving quantized model. Make sure to install the latest transformers + bnb.

Sep 16 '24 22:09 SunMarc