galai icon indicating copy to clipboard operation
galai copied to clipboard

AssertionError: Torch not compiled with CUDA enabled

Open Naugustogi opened this issue 3 years ago • 8 comments

if there isn't anything special, the normal quickstart install doesn't work.

Naugustogi avatar Nov 16 '22 00:11 Naugustogi

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here.

Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line.

Good luck!

ZQ-Dev8 avatar Nov 16 '22 17:11 ZQ-Dev8

I have the same issue with a MacBook Pro with an AMD graphic card. I don't think installing a Cuda-enabled version of PyTorch is an option in my case.

ftencaten avatar Nov 16 '22 19:11 ftencaten

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here.

Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line.

Good luck!

Cuda 11.7 with gpu is already installed, i could use an anaconda environment but i don't have much experience in that, still it doesn't work,

Naugustogi avatar Nov 17 '22 17:11 Naugustogi

Has anyone found a workaround to this?

dionator avatar Nov 17 '22 17:11 dionator

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here. Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line. Good luck!

Cuda 11.7 with gpu is already installed, i could use an anaconda environment but i don't have much experience in that, still it doesn't work,

Is the CPU-only version also installed? If so, try uninstalling it. Otherwise, it sounds like an environment issue and I would make a new conda/venv environment. Both are relatively easy to get set up, here's a good place to start.

ZQ-Dev8 avatar Nov 17 '22 19:11 ZQ-Dev8

Has anyone found a workaround to this?

another attempt with the huggingface transformer worked, maybe abit complicated also had to use a cpu version of a package

Naugustogi avatar Nov 24 '22 15:11 Naugustogi

Hi @Naugustogi, can you check if you still experience the issues with galai version 1.1.0? You should be able to use the model on CPU with load_model(..., num_gpus=0).

mkardas avatar Dec 09 '22 10:12 mkardas

num_gpus=0)

doesn't work either

AssertionError: Torch not compiled with CUDA enabled

Naugustogi avatar Dec 10 '22 14:12 Naugustogi

@Naugustogi any chance you can provide the full stack trace?

mkardas avatar Jan 03 '23 13:01 mkardas

@Naugustogi any chance you can provide the full stack trace? it happened after i started the program normally with inference


import galai as gal
model = gal.load_model(name = 'mini',num_gpus=0)
model.generate("Scaled dot product attention:\n\n\\[")

i just use the cpu version


┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ F:\galai-1.0.0\start.py:2 in <module> │
│                                                                                                  │
│   1 import galai as gal                                                                          │
│ > 2 model = gal.load_model(name = 'mini',num_gpus=0)                                             │
│   3 model.generate("Scaled dot product attention:\n\n\\[")                                       │
│                                                                                                  │
│ F:\galai-1.0.0\galai\__init__.py:40   │
│ in load_model                                                                                    │
│                                                                                                  │
│   37 │   model = Model(name=name, dtype=dtype, num_gpus=num_gpus)                                │
│   38 │   model._set_tokenizer(tokenizer_path=get_tokenizer_path())                               │
│   39 │   if name in ['mini', 'base']:                                                            │
│ > 40 │   │   model._load_checkpoint(checkpoint_path=get_checkpoint_path(name))                   │
│   41 │   else:                                                                                   │
│   42 │   │   model._load_checkpoint(checkpoint_path=get_checkpoint_path(name))                   │
│   43                                                                                             │
│                                                                                                  │
│ F:\galai-1.0.0\galai\model.py:63 in   │
│ _load_checkpoint                                                                                 │
│                                                                                                  │
│    60 │   │   if 'mini' in checkpoint_path or 'base' in checkpoint_path:                         │
│    61 │   │   │   checkpoint_path = checkpoint_path + '/pytorch_model.bin'                       │
│    62 │   │                                                                                      │
│ >  63 │   │   load_checkpoint_and_dispatch(                                                      │
│    64 │   │   │   self.model.model,                                                              │
│    65 │   │   │   checkpoint_path,                                                               │
│    66 │   │   │   device_map=device_map,                                                         │
│                                                                                                  │
│ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\big_modeling. │
│ py:366 in load_checkpoint_and_dispatch                                                           │
│                                                                                                  │
│   363 │   │   )                                                                                  │
│   364 │   if offload_state_dict is None and "disk" in device_map.values():                       │
│   365 │   │   offload_state_dict = True                                                          │
│ > 366 │   load_checkpoint_in_model(                                                              │
│   367 │   │   model,                                                                             │
│   368 │   │   checkpoint,                                                                        │
│   369 │   │   device_map=device_map,                                                             │
│                                                                                                  │
│ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modelin │
│ g.py:701 in load_checkpoint_in_model                                                             │
│                                                                                                  │
│   698 │   │   │   │   │   set_module_tensor_to_device(model, param_name, "meta")                 │
│   699 │   │   │   │   │   offload_weight(param, param_name, state_dict_folder, index=state_dic   │
│   700 │   │   │   │   else:                                                                      │
│ > 701 │   │   │   │   │   set_module_tensor_to_device(model, param_name, param_device, value=p   │
│   702 │   │                                                                                      │
│   703 │   │   # Force Python to clean up.                                                        │
│   704 │   │   del checkpoint                                                                     │
│                                                                                                  │
│ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modelin │
│ g.py:124 in set_module_tensor_to_device                                                          │
│                                                                                                  │
│   121 │   │   if value is None:                                                                  │
│   122 │   │   │   new_value = old_value.to(device)                                               │
│   123 │   │   elif isinstance(value, torch.Tensor):                                              │
│ > 124 │   │   │   new_value = value.to(device)                                                   │
│   125 │   │   else:                                                                              │
│   126 │   │   │   new_value = torch.tensor(value, device=device)                                 │
│   127                                                                                            │
│                                                                                                  │
│ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\__init__.py:2 │
│ 21 in _lazy_init                                                                                 │
│                                                                                                  │
│   218 │   │   │   │   "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "        │
│   219 │   │   │   │   "multiprocessing, you must use the 'spawn' start method")                  │
│   220 │   │   if not hasattr(torch._C, '_cuda_getDeviceCount'):                                  │
│ > 221 │   │   │   raise AssertionError("Torch not compiled with CUDA enabled")                   │
│   222 │   │   if _cudart is None:                                                                │
│   223 │   │   │   raise AssertionError(                                                          │
│   224 │   │   │   │   "libcudart functions unavailable. It looks like you have a broken build?   │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
AssertionError: Torch not compiled with CUDA enabled

Naugustogi avatar Jan 03 '23 21:01 Naugustogi

Thanks @Naugustogi. The traceback shows galai 1.0.0. Can you try with 1.1.2?

mkardas avatar Jan 04 '23 11:01 mkardas

Thanks @Naugustogi. The traceback shows galai 1.0.0. Can you try with 1.1.2?

i'm not sure where to get that, in this repo, its just version 1.0.0 (3 weeks ago)

Naugustogi avatar Jan 05 '23 14:01 Naugustogi

@Naugustogi You can install it with pip or clone the main git branch (currently at 1.1.2, you can verify by inspecting the setup.py file in your installation).

mkardas avatar Jan 05 '23 14:01 mkardas

@Naugustogi

alright, 1.1.2 doesn't work either, it won't even show me any error, after starting, it returns the main folder

Naugustogi avatar Jan 05 '23 14:01 Naugustogi

it returns the main folder

what do you mean? If you are running it as a script, you need to wrap the last line in print().

mkardas avatar Jan 05 '23 15:01 mkardas

what do you mean? If you are running it as a script, you need to wrap the last line in print().

ok it worked

Naugustogi avatar Jan 05 '23 15:01 Naugustogi