OpenLLM bug: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

Describe the bug

Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

To reproduce

run

openllm start ./models--01-ai--Yi-34B-Chat/snapshots/a99ec35331cbfc9da596af7d4538fe2efecff03c --adapter-id ./yi-34b-chat/rank-0:default --backend vllm --workers-per-resource 0.98

Logs

Given model is a local model, OpenLLM will load model into memory for serialisation.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 15/15 [01:52<00:00,  7.50s/it]

Traceback (most recent call last):
  File "/opt/conda/bin/openllm", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 204, in wrapper
    return_value = func(*args, **attrs)
  File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 183, in wrapper
    return f(*args, **attrs)
  File "/opt/conda/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 415, in start_command
    llm = openllm.LLM[t.Any, t.Any](
  File "/opt/conda/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openllm/_llm.py", line 200, in __init__
    model = openllm.serialisation.import_model(self, trust_remote_code=self.trust_remote_code)
  File "/opt/conda/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 59, in caller
    return getattr(importlib.import_module(f'.{serde}', 'openllm.serialisation'), fn)(llm, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 53, in import_model
    model.save_pretrained(bentomodel.path, max_shard_size='2GB', safe_serialization=safe_serialisation)
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2187, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/opt/conda/lib/python3.10/site-packages/safetensors/torch.py", line 232, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

Environment

openllm: 0.4.34
nvidia-smi

Tue Dec  5 15:14:40 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:CB:00.0 Off |                    0 |
| N/A   34C    P0    56W / 400W |      0MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-SXM4-40GB      On   | 00000000:D0:00.0 Off |                   0* |
| N/A   33C    P0    54W / 400W |      0MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |

System information (Optional)

No response

Dec 05 '23 07:12 a1164714

Hi there, the vllm backend is not yet supported with adapters.

Dec 05 '23 20:12 aarnphm

I was facing the same error today. Turns out my machine was actually out of disk storage. Freeing up space made it work.

Dec 06 '23 07:12 tanmay17061