OpenLLM
OpenLLM copied to clipboard
bug: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })
Describe the bug
Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })
To reproduce
run
openllm start ./models--01-ai--Yi-34B-Chat/snapshots/a99ec35331cbfc9da596af7d4538fe2efecff03c --adapter-id ./yi-34b-chat/rank-0:default --backend vllm --workers-per-resource 0.98
Logs
Given model is a local model, OpenLLM will load model into memory for serialisation.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 15/15 [01:52<00:00, 7.50s/it]
Traceback (most recent call last):
File "/opt/conda/bin/openllm", line 8, in <module>
sys.exit(cli())
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 204, in wrapper
return_value = func(*args, **attrs)
File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 183, in wrapper
return f(*args, **attrs)
File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 415, in start_command
llm = openllm.LLM[t.Any, t.Any](
File "/opt/conda/lib/python3.10/typing.py", line 957, in __call__
result = self.__origin__(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/openllm/_llm.py", line 200, in __init__
model = openllm.serialisation.import_model(self, trust_remote_code=self.trust_remote_code)
File "/opt/conda/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 59, in caller
return getattr(importlib.import_module(f'.{serde}', 'openllm.serialisation'), fn)(llm, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 53, in import_model
model.save_pretrained(bentomodel.path, max_shard_size='2GB', safe_serialization=safe_serialisation)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2187, in save_pretrained
safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
File "/opt/conda/lib/python3.10/site-packages/safetensors/torch.py", line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })
Environment
- openllm: 0.4.34
- nvidia-smi
Tue Dec 5 15:14:40 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:CB:00.0 Off | 0 |
| N/A 34C P0 56W / 400W | 0MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 A100-SXM4-40GB On | 00000000:D0:00.0 Off | 0* |
| N/A 33C P0 54W / 400W | 0MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
System information (Optional)
No response
Hi there, the vllm backend is not yet supported with adapters.
I was facing the same error today. Turns out my machine was actually out of disk storage. Freeing up space made it work.