bug: Getting 500 with running dolly-v2
Describe the bug
Thank you for creating this great repo! I am running simple "openllm start dolly-v2" and getting a 500 internal server error related to "model_kwargs". Not sure how to proceed. See reproduce and errors below.
To reproduce
Created docker container from this image and ran on a T4 instance: FROM nvidia/cuda:11.0.3-runtime-ubuntu20.04
ENV BENTOML_HOME="/model_store/" ENV CUDA_VISIBLE_DEVICES=0
Update apt-get and install pip
RUN apt-get update && apt-get install -y python3-pip RUN pip3 install openllm EXPOSE 3000
ENTRYPOINT [ "openllm", "start" ] CMD [ "dolly-v2", "--model-id", "databricks/dolly-v2-3b", "--device", "0", "-p", "3000", "--verbose"]
When I go to localhost:3000 and post to the inference endpoint I get a 500 internal error.
Logs
2023-06-27T04:29:13+0000 [ERROR] [runner:llm-dolly-v2-runner:1] Exception on runner 'llm-dolly-v2-runner' method 'generate' (trace=666892bd204b5873d08d4ffe96808e94,span=4d1d367c3ded45cb,sampled=1,service.name=llm-dolly-v2-runner)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/runner_app.py", line 352, in _run
ret = await runner_method.async_run(*params.args, **params.kwargs)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/runner/runner_handle/local.py", line 59, in async_run_method
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/runner/runnable.py", line 140, in method
return self.func(obj, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 1241, in generate
return self.generate(prompt, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/models/dolly_v2/modeling_dolly_v2.py", line 273, in generate
return self.model(
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1120, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1127, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1026, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/usr/local/lib/python3.8/dist-packages/openllm/models/dolly_v2/modeling_dolly_v2.py", line 114, in _forward
generated_sequence = self.model.generate(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1271, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1144, in _validate_model_kwargs
raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['accelerator'] (note: typos in the generate arguments will also show up in this list)
2023-06-27T04:29:13+0000 [INFO] [runner:llm-dolly-v2-runner:1] - "POST /generate HTTP/1.1" 500 (trace=666892bd204b5873d08d4ffe96808e94,span=13c7890ba80845b0,sampled=1,service.name=llm-dolly-v2-runner)
2023-06-27T04:29:13+0000 [INFO] [runner:llm-dolly-v2-runner:1] _ (scheme=http,method=POST,path=/generate,type=application/octet-stream,length=888) (status=500,type=text/plain,length=0) 4.769ms (trace=666892bd204b5873d08d4ffe96808e94,span=4d1d367c3ded45cb,sampled=1,service.name=llm-dolly-v2-runner)
2023-06-27T04:29:13+0000 [ERROR] [api_server:4] Exception on /v1/generate [POST] (trace=666892bd204b5873d08d4ffe96808e94,span=2aa0dd91ca45f3c4,sampled=1,service.name=llm-dolly-v2-service)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/usr/local/lib/python3.8/dist-packages/openllm/_service.py", line 86, in generate_v1
responses = await runner.generate.async_run(qa_inputs.prompt, **config)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/runner/runner.py", line 55, in async_run
return await self.runner._runner_handle.async_run_method(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/runner/runner_handle/remote.py", line 246, in async_run_method
raise RemoteException(
bentoml.exceptions.RemoteException: An unexpected exception occurred in remote runner llm-dolly-v2-runner: [500]
Environment
openllm, version 0.1.14
System information (Optional)
Dockerfile (executing on a T4 with 4 GPUs): FROM nvidia/cuda:11.0.3-runtime-ubuntu20.04
ENV BENTOML_HOME="/model_store/" ENV CUDA_VISIBLE_DEVICES=0
Update apt-get and install pip
RUN apt-get update && apt-get install -y python3-pip RUN pip3 install openllm EXPOSE 3000
ENTRYPOINT [ "openllm", "start" ] CMD [ "dolly-v2", "--model-id", "databricks/dolly-v2-3b", "--device", "0", "-p", "3000", "--verbose"]
Dockerfile formatted weird: FROM nvidia/cuda:11.0.3-runtime-ubuntu20.04
ENV BENTOML_HOME="/model_store/" ENV CUDA_VISIBLE_DEVICES=0
RUN apt-get update && apt-get install -y python3-pip RUN pip3 install openllm EXPOSE 3000
ENTRYPOINT [ "openllm", "start" ] CMD [ "dolly-v2", "--model-id", "databricks/dolly-v2-3b", "--device", "0", "-p", "3000", "--verbose"]
Oh yes I have fixed this with 0.1.16.
also it is recommended to use openllm build and let BentoML handles the packaging for you instead of writing custom dockerfile like this
Thank you! I will try that. For these LLMs are there any recommended base images for GPUs or any recommended settings to pass to "bentoml containerize" for these models in particular?
bentoml containerize should just works. the purpose of containerize is to create the container for the bento.
In terms of how to run the container, see https://github.com/NVIDIA/nvidia-docker for leveraging GPU in OCI-compatible container (BentoContainer is OCI-compatible)
Can you try out with 0.1.17?
It worked with 0.1.15 and trying it now for 0.1.17.
dolly-v2 worked with 0.1.17, but running into an issue downloading falcon referenced in another issue:
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
Enabling debug mode for current BentoML session
Make sure to have the following dependencies available: ['einops', 'xformers']
Downloading (…)lve/main/config.json: 100%|██████████| 950/950 [00:00<00:00, 113kB/s]
Downloading (…)/configuration_RW.py: 100%|██████████| 2.61k/2.61k [00:00<00:00, 368kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)okenizer_config.json: 100%|██████████| 220/220 [00:00<00:00, 138kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 2.73M/2.73M [00:00<00:00, 36.4MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 281/281 [00:00<00:00, 193kB/s]
Downloading (…)main/modelling_RW.py: 100%|██████████| 47.6k/47.6k [00:00<00:00, 26.2MB/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 1557, in download_models
_ref = bentoml.transformers.get(model.tag)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/frameworks/transformers.py", line 292, in get
model = bentoml.models.get(tag_like)
File "/usr/local/lib/python3.8/dist-packages/simple_di/__init__.py", line 139, in _
return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
File "/usr/local/lib/python3.8/dist-packages/bentoml/models.py", line 42, in get
return _model_store.get(tag)
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/store.py", line 146, in get
raise NotFound(
bentoml.exceptions.NotFound: Model 'pt-tiiuae-falcon-7b:2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5' is not found in BentoML store <osfs '/model_store/models'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/openllm/__main__.py", line 26, in <module>
cli()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 380, in wrapper
return func(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 353, in wrapper
return_value = func(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 328, in wrapper
return f(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 1584, in download_models
_ref = model.import_model(
File "/usr/local/lib/python3.8/dist-packages/openllm/models/falcon/modeling_falcon.py", line 64, in import_model
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 475, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/usr/local/lib/python3.8/dist-packages/transformers/dynamic_module_utils.py", line 431, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/usr/local/lib/python3.8/dist-packages/transformers/dynamic_module_utils.py", line 268, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
File "/usr/local/lib/python3.8/dist-packages/transformers/dynamic_module_utils.py", line 151, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: einops. Run `pip install einops`
Traceback (most recent call last):
File "/usr/local/bin/openllm", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 380, in wrapper
return func(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 353, in wrapper
return_value = func(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 328, in wrapper
return f(*args, **attrs)
File "/usr/local/lib/python3.8/dist-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/openllm/cli.py", line 935, in start_cmd
llm = t.cast(
File "/usr/local/lib/python3.8/dist-packages/openllm/models/auto/factory.py", line 135, in for_model
llm.ensure_model_id_exists()
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 943, in ensure_model_id_exists
output = subprocess.check_output(
File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'openllm', 'download', 'falcon', '--model-id', 'tiiuae/falcon-7b', '--machine', '--implementation', 'pt']' returned non-zero exit status 1.
As the exception said, you need to install einops
You need to do pip install "openllm[falcon]"
Feel free to open a different issue if you are still running into this