bug: Linux Mint: The Service throws errors when getting model-requests

Open HerzogVolpe opened this issue 2 years ago • 1 comments

Describe the bug

After running the Instructions like displayed in the Readme on Linux mint, i get an error after requesting from the WebAPI

To reproduce

Commands Executed

pip install openllm
openllm -h
TRUST_REMOTE_CODE=True openllm start microsoft/phi-2

On executing the '/v1/generate' example from the WebUI

curl -X 'POST' \
  'http://0.0.0.0:3000/v1/generate' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "What is the meaning of life?",
  "stop": [
    "\n"
  ],
  "llm_config": {
    "max_new_tokens": 128,
    "min_length": 0,
    "early_stopping": false,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_cache": true,
    "temperature": 0.75,
    "top_k": 15,
    "top_p": 0.78,
    "typical_p": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "diversity_penalty": 0,
    "repetition_penalty": 1,
    "encoder_repetition_penalty": 1,
    "length_penalty": 1,
    "no_repeat_ngram_size": 0,
    "renormalize_logits": false,
    "remove_invalid_values": false,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "encoder_no_repeat_ngram_size": 0,
    "n": 1,
    "best_of": null,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "use_beam_search": false,
    "ignore_eos": false,
    "skip_special_tokens": true
  },
  "adapter_name": null
}'

i got the follwing error

Response body
"An error has occurred in BentoML user code when handling this request, find the error details in server logs"

Response headers
content-length: 110  
content-type: application/json  
date: Fri,29 Dec 2023 10:55:15 GMT 
server: uvicorn  
x-bentoml-request-id: 15036822334355354513

Logs

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
vLLM is not available. Note that PyTorch backend is not as performant as vLLM and you should always consider using vLLM for production.
🚀Tip: run 'openllm build microsoft/phi-2 --backend pt --serialization safetensors' to create a BentoLLM for 'microsoft/phi-2'
2023-12-29T12:56:55+0100 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.
2023-12-29T12:56:56+0100 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  8.84it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  9.49it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2023-12-29T12:57:03+0100 [ERROR] [runner:llm-phi-runner:1] Exception in ASGI application
+ Exception Group Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
|     await self.app(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 596, in __call__
|     await self.app(scope, otel_receive, otel_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/instruments.py", line 252, in __call__
|     await self.app(scope, receive, wrapped_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
|     await self.app(scope, receive, wrapped_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
|     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
|     raise exc
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
|     await app(scope, receive, sender)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 754, in __call__
|     await self.middleware_stack(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 774, in app
|     await route.handle(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 296, in handle
|     await self.app(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
|     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
|     raise exc
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
|     await app(scope, receive, sender)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
|     await response(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 254, in __call__
|     async with anyio.create_task_group() as task_group:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
|     raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 257, in wrap
|     await func()
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 246, in stream_response
|     async for chunk in self.body_iterator:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/runner_app.py", line 373, in stream_encoder
|     async for p in payload:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/runner_app.py", line 214, in inner
|     async for data in ret:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_runners.py", line 220, in generate_iterator
|     out = self.model(input_ids=start_ids, use_cache=True)
|           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 953, in forward
|     hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
|                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 915, in forward
|     hidden_states = layer(
|                     ^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 768, in forward
|     hidden_states = self.ln(hidden_states)
|                     ^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 196, in forward
|     return F.layer_norm(
|            ^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/functional.py", line 2543, in layer_norm
|     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
+------------------------------------

During handling of the above exception, another exception occurred:

+ Exception Group Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
|     result = await app(  # type: ignore[func-returns-value]
|              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
|     return await self.app(scope, receive, send)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 116, in __call__
|     await self.middleware_stack(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
|     raise exc
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
|     await self.app(scope, receive, _send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/traffic.py", line 23, in __call__
|     async with anyio.create_task_group():
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
|     raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in __call__
|     await wrap(partial(self.listen_for_disconnect, receive))
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 257, in wrap
|     await func()
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 234, in listen_for_disconnect
|     message = await receive()
|               ^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 634, in otel_receive
|     message = await receive()
|               ^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 538, in receive
|     await self.message_event.wait()
|   File "/home/daniel/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
|     await fut
| asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f6c3f596ad0
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
|     await self.app(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 596, in __call__
|     await self.app(scope, otel_receive, otel_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/instruments.py", line 252, in __call__
|     await self.app(scope, receive, wrapped_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
|     await self.app(scope, receive, wrapped_send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
|     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
|     raise exc
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
|     await app(scope, receive, sender)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 754, in __call__
|     await self.middleware_stack(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 774, in app
|     await route.handle(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 296, in handle
|     await self.app(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
|     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
|     raise exc
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
|     await app(scope, receive, sender)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
|     await response(scope, receive, send)
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 254, in __call__
|     async with anyio.create_task_group() as task_group:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
|     raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 257, in wrap
|     await func()
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 246, in stream_response
|     async for chunk in self.body_iterator:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/runner_app.py", line 373, in stream_encoder
|     async for p in payload:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/runner_app.py", line 214, in inner
|     async for data in ret:
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_runners.py", line 220, in generate_iterator
|     out = self.model(input_ids=start_ids, use_cache=True)
|           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 953, in forward
|     hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
|                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 915, in forward
|     hidden_states = layer(
|                     ^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/.cache/huggingface/modules/transformers_modules/d3186761bf5c4409f7679359284066c25ab668ee/modeling_phi.py", line 768, in forward
|     hidden_states = self.ln(hidden_states)
|                     ^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
|     return self._call_impl(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
|     return forward_call(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/modules/normalization.py", line 196, in forward
|     return F.layer_norm(
|            ^^^^^^^^^^^^^
|   File "/home/daniel/miniconda3/lib/python3.11/site-packages/torch/nn/functional.py", line 2543, in layer_norm
|     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
+------------------------------------
2023-12-29T12:57:03+0100 [ERROR] [api_server:llm-phi-service:9] Exception on /v1/generate [POST] (trace=903b0a0cb59317f28a48881b3f3b02f3,span=9a955f3aba7f3f0f,sampled=1,service.name=llm-phi-service)
Traceback (most recent call last):
File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_llm.py", line 115, in generate_iterator
async for out in generator:
File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 329, in async_stream_method
async for b in resp.content.iter_any():
File "/home/daniel/miniconda3/lib/python3.11/site-packages/aiohttp/streams.py", line 44, in __anext__
rv = await self.read_func()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/miniconda3/lib/python3.11/site-packages/aiohttp/streams.py", line 395, in readany
await self._wait("readany")
File "/home/daniel/miniconda3/lib/python3.11/site-packages/aiohttp/streams.py", line 302, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/daniel/miniconda3/lib/python3.11/site-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_service.py", line 23, in generate_v1
return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_llm.py", line 55, in generate
async for result in self.generate_iterator(
File "/home/daniel/miniconda3/lib/python3.11/site-packages/openllm/_llm.py", line 125, in generate_iterator
raise RuntimeError(f'Exception caught during generation: {err}') from err
RuntimeError: Exception caught during generation: Response payload is not completed
2023-12-29T12:57:03+0100 [INFO] [api_server:llm-phi-service:9] 127.0.0.1:33200 (scheme=http,method=POST,path=/v1/generate,type=application/json,length=951) (status=500,type=application/json,length=110) 170.354ms (trace=903b0a0cb59317f28a48881b3f3b02f3,span=9a955f3aba7f3f0f,sampled=1,service.name=llm-phi-service)



### Environment

bentoml env
#### Environment variable

```bash
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11 python: 3.11.5 platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35 uid_gid: 1000:1000 conda: 23.10.0 in_conda_env: True

conda_packages

name: base
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - archspec=0.2.1=pyhd3eb1b0_0
  - boltons=23.0.0=py311h06a4308_0
  - brotli-python=1.0.9=py311h6a678d5_7
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.19.1=h5eee18b_0
  - ca-certificates=2023.08.22=h06a4308_0
  - certifi=2023.7.22=py311h06a4308_0
  - cffi=1.15.1=py311h5eee18b_3
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - conda=23.10.0=py311h06a4308_0
  - conda-content-trust=0.2.0=py311h06a4308_0
  - conda-libmamba-solver=23.11.1=py311h06a4308_0
  - conda-package-handling=2.2.0=py311h06a4308_0
  - conda-package-streaming=0.9.0=py311h06a4308_0
  - cryptography=41.0.3=py311hdda0065_0
  - fmt=9.1.0=hdb19cb5_0
  - icu=73.1=h6a678d5_0
  - idna=3.4=py311h06a4308_0
  - jsonpatch=1.32=pyhd3eb1b0_0
  - jsonpointer=2.1=pyhd3eb1b0_0
  - krb5=1.20.1=h143b758_1
  - ld_impl_linux-64=2.38=h1181459_1
  - libarchive=3.6.2=h6ac8c49_2
  - libcurl=8.4.0=h251f7ec_0
  - libedit=3.1.20221030=h5eee18b_0
  - libev=4.33=h7f8727e_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libmamba=1.5.3=haf1ee3a_0
  - libmambapy=1.5.3=py311h2dafd23_0
  - libnghttp2=1.57.0=h2d74bed_0
  - libsolv=0.7.24=he621ea3_0
  - libssh2=1.10.0=hdbd6064_2
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - libxml2=2.10.4=hf1b16e4_1
  - lz4-c=1.9.4=h6a678d5_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.12=h7f8727e_0
  - packaging=23.1=py311h06a4308_0
  - pcre2=10.42=hebb0a14_0
  - pip=23.3=py311h06a4308_0
  - pluggy=1.0.0=py311h06a4308_1
  - pybind11-abi=4=hd3eb1b0_1
  - pycosat=0.6.6=py311h5eee18b_0
  - pycparser=2.21=pyhd3eb1b0_0
  - pyopenssl=23.2.0=py311h06a4308_0
  - pysocks=1.7.1=py311h06a4308_0
  - python=3.11.5=h955ad1f_0
  - readline=8.2=h5eee18b_0
  - reproc=14.2.4=h295c915_1
  - reproc-cpp=14.2.4=h295c915_1
  - requests=2.31.0=py311h06a4308_0
  - ruamel.yaml=0.17.21=py311h5eee18b_0
  - setuptools=68.0.0=py311h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - tqdm=4.65.0=py311h92b7b1e_0
  - truststore=0.8.0=py311h06a4308_0
  - urllib3=1.26.18=py311h06a4308_0
  - wheel=0.41.2=py311h06a4308_0
  - xz=5.4.2=h5eee18b_0
  - yaml-cpp=0.8.0=h6a678d5_0
  - zlib=1.2.13=h5eee18b_0
  - zstandard=0.19.0=py311h5eee18b_0
  - zstd=1.5.5=hc292b87_0
  - pip:
      - accelerate==0.25.0
      - aiohttp==3.9.1
      - aiosignal==1.3.1
      - anyio==4.2.0
      - appdirs==1.4.4
      - asgiref==3.7.2
      - attrs==23.1.0
      - bentoml==1.1.11
      - bitsandbytes==0.41.3.post2
      - build==0.10.0
      - cattrs==23.1.2
      - circus==0.18.0
      - click==8.1.7
      - click-option-group==0.5.6
      - cloudpickle==3.0.0
      - coloredlogs==15.0.1
      - contextlib2==21.6.0
      - cuda-python==12.3.0
      - datasets==2.16.0
      - deepmerge==1.1.1
      - deprecated==1.2.14
      - dill==0.3.7
      - distlib==0.3.8
      - distro==1.9.0
      - einops==0.7.0
      - fastcore==1.5.29
      - filelock==3.13.1
      - filetype==1.2.0
      - frozenlist==1.4.1
      - fs==2.4.16
      - fsspec==2023.10.0
      - ghapi==1.0.4
      - h11==0.14.0
      - httpcore==1.0.2
      - httpx==0.26.0
      - huggingface-hub==0.20.1
      - humanfriendly==10.0
      - importlib-metadata==6.11.0
      - inflection==0.5.1
      - jinja2==3.1.2
      - markdown-it-py==3.0.0
      - markupsafe==2.1.3
      - mdurl==0.1.2
      - mpmath==1.3.0
      - multidict==6.0.4
      - multiprocess==0.70.15
      - mypy-extensions==1.0.0
      - networkx==3.2.1
      - numpy==1.26.2
      - nvidia-cublas-cu12==12.1.3.1
      - nvidia-cuda-cupti-cu12==12.1.105
      - nvidia-cuda-nvrtc-cu12==12.1.105
      - nvidia-cuda-runtime-cu12==12.1.105
      - nvidia-cudnn-cu12==8.9.2.26
      - nvidia-cufft-cu12==11.0.2.54
      - nvidia-curand-cu12==10.3.2.106
      - nvidia-cusolver-cu12==11.4.5.107
      - nvidia-cusparse-cu12==12.1.0.106
      - nvidia-ml-py==11.525.150
      - nvidia-nccl-cu12==2.18.1
      - nvidia-nvjitlink-cu12==12.3.101
      - nvidia-nvtx-cu12==12.1.105
      - openllm==0.4.41
      - openllm-client==0.4.41
      - openllm-core==0.4.41
      - opentelemetry-api==1.20.0
      - opentelemetry-instrumentation==0.41b0
      - opentelemetry-instrumentation-aiohttp-client==0.41b0
      - opentelemetry-instrumentation-asgi==0.41b0
      - opentelemetry-sdk==1.20.0
      - opentelemetry-semantic-conventions==0.41b0
      - opentelemetry-util-http==0.41b0
      - optimum==1.16.1
      - orjson==3.9.10
      - pandas==2.1.4
      - pathspec==0.12.1
      - pillow==10.1.0
      - pip-requirements-parser==32.0.1
      - pip-tools==7.3.0
      - platformdirs==4.1.0
      - prometheus-client==0.19.0
      - protobuf==4.25.1
      - psutil==5.9.7
      - pyarrow==14.0.2
      - pyarrow-hotfix==0.6
      - pydantic==1.10.13
      - pygments==2.17.2
      - pyparsing==3.1.1
      - pyproject-hooks==1.0.0
      - python-dateutil==2.8.2
      - python-json-logger==2.0.7
      - python-multipart==0.0.6
      - pytz==2023.3.post1
      - pyyaml==6.0.1
      - pyzmq==25.1.2
      - regex==2023.12.25
      - rich==13.7.0
      - safetensors==0.4.1
      - schema==0.7.5
      - scipy==1.11.4
      - sentencepiece==0.1.99
      - simple-di==0.1.5
      - six==1.16.0
      - sniffio==1.3.0
      - starlette==0.34.0
      - sympy==1.12
      - tokenizers==0.15.0
      - torch==2.1.2
      - tornado==6.4
      - transformers==4.36.2
      - triton==2.1.0
      - typing-extensions==4.9.0
      - tzdata==2023.3
      - uvicorn==0.25.0
      - virtualenv==20.25.0
      - watchfiles==0.21.0
      - wrapt==1.16.0
      - xxhash==3.4.1
      - yarl==1.9.4
      - zipp==3.17.0
prefix: /home/daniel/miniconda3

pip_packages

accelerate==0.25.0
aiohttp==3.9.1
aiosignal==1.3.1
anyio==4.2.0
appdirs==1.4.4
archspec @ file:///croot/archspec_1697725767277/work
asgiref==3.7.2
attrs==23.1.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
boltons @ file:///work/ci_py311/boltons_1677685195580/work
Brotli @ file:///work/ci_py311/brotli-split_1676830125088/work
build==0.10.0
cattrs==23.1.2
certifi @ file:///croot/certifi_1690232220950/work/certifi
cffi @ file:///work/ci_py311/cffi_1676822533496/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
conda @ file:///croot/conda_1699303130622/work
conda-content-trust @ file:///croot/conda-content-trust_1693490622020/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1700150901600/work/src
conda-package-handling @ file:///croot/conda-package-handling_1690999929514/work
conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work
contextlib2==21.6.0
cryptography @ file:///croot/cryptography_1694444244250/work
cuda-python==12.3.0
datasets==2.16.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.8
distro==1.9.0
einops==0.7.0
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.10.0
ghapi==1.0.4
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.1
humanfriendly==10.0
idna @ file:///work/ci_py311/idna_1676822698822/work
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.2
jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work
jsonpointer==2.1
libmambapy @ file:///croot/mamba-split_1698782620632/work/libmambapy
markdown-it-py==3.0.0
MarkupSafe==2.1.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.2.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openllm==0.4.41
openllm-client==0.4.41
openllm-core==0.4.41
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.16.1
orjson==3.9.10
packaging @ file:///croot/packaging_1693575174725/work
pandas==2.1.4
pathspec==0.12.1
Pillow==10.1.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.1.0
pluggy @ file:///work/ci_py311/pluggy_1676822818071/work
prometheus-client==0.19.0
protobuf==4.25.1
psutil==5.9.7
pyarrow==14.0.2
pyarrow-hotfix==0.6
pycosat @ file:///croot/pycosat_1696536503704/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==1.10.13
Pygments==2.17.2
pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work
pyparsing==3.1.1
pyproject_hooks==1.0.0
PySocks @ file:///work/ci_py311/pysocks_1676822712504/work
python-dateutil==2.8.2
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.2
regex==2023.12.25
requests @ file:///croot/requests_1690400202158/work
rich==13.7.0
ruamel.yaml @ file:///work/ci_py311/ruamel.yaml_1676838772170/work
safetensors==0.4.1
schema==0.7.5
scipy==1.11.4
sentencepiece==0.1.99
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.34.0
sympy==1.12
tokenizers==0.15.0
torch==2.1.2
tornado==6.4
tqdm @ file:///croot/tqdm_1679561862951/work
transformers==4.36.2
triton==2.1.0
truststore @ file:///croot/truststore_1695244293384/work
typing_extensions==4.9.0
tzdata==2023.3
urllib3 @ file:///croot/urllib3_1698257533958/work
uvicorn==0.25.0
virtualenv==20.25.0
watchfiles==0.21.0
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
zstandard @ file:///work/ci_py311_2/zstandard_1679339489613/work

/home/daniel/miniconda3/lib/python3.11/site-packages/torch/cuda/init.py:611: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML")

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.36.2
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python version: 3.11.5
Huggingface_hub version: 0.20.1
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: True
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: False
- main_training_function: main
- downcast_bf16: False
- tpu_use_cluster: False
- tpu_use_sudo: False
PyTorch version (GPU?): 2.1.2+cu121 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

System information (Optional)

OS: Linux Mint CPU: Ryzen 5 2600 RAM: DDR4 64GB GPU: RTX 3060 12GB

Dec 29 '23 12:12 HerzogVolpe

the same problem but with mistral models

after check log.

2024-02-01T10:32:24+0000 [INFO] [api_server:12] 172.17.0.1:33106 (scheme=http,method=POST,path=/v1/generate,type=application/json,length=46) (status=500,type=application/json,length=110) 4624.572ms (trace=98f17d6566d01280e521c85644a9c515,span=33c1901ac96f7b85,sampled=1,service.name=llm-mistral-service)
2024-02-01T10:32:32+0000 [INFO] [runner:llm-mistral-runner:1] _ (scheme=http,method=POST,path=/generate_iterator,type=application/octet-stream,length=1774) (status=200,type=application/vnd.bentoml.stream_outputs,length=) 12304.698ms (trace=98f17d6566d01280e521c85644a9c515,span=ac50ed178f5ec9ac,sampled=1,service.name=llm-mistral-runner)

runner returns the correct response with code 200, but api_service return to client error 500

Feb 01 '24 09:02 mkaskov

close for openllm 0.6

Jul 12 '24 01:07 bojiang