[Bug] TEI Gaudi 2 image is failing to launch
Priority
Undecided
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source
Deploy method
- [X] Docker compose
- [ ] Docker
- [ ] Kubernetes
- [ ] Helm
Running nodes
Single Node
What's the version?
latest
Description
TEI Gaudi image is not lauching due to errors.
Reproduce steps
After running the docker compose the image "opea/tei-gaudi:latest " started but it fails after launching
Raw log
/ChatQnA/docker_compose/intel/hpu/gaudi$ docker logs 349bc3685e97
2024-09-15T19:24:20.318465Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "349bc3685e97", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-09-15T19:24:20.318939Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-09-15T19:24:20.445084Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:45: Downloading `1_Pooling/config.json`
2024-09-15T19:24:21.565035Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:108: Downloading `config_sentence_transformers.json`
2024-09-15T19:24:21.693734Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-09-15T19:24:21.693774Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:22: Downloading `config.json`
2024-09-15T19:24:21.823411Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:25: Downloading `tokenizer.json`
2024-09-15T19:24:22.137401Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:52: Downloading `model.safetensors`
2024-09-15T19:24:43.732689Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:39: Model artifacts downloaded in 22.038954482s
2024-09-15T19:24:44.030320Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-09-15T19:24:44.049828Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:26: Starting 152 tokenization workers
2024-09-15T19:24:44.374782Z INFO text_embeddings_router: router/src/lib.rs:250: Starting model backend
2024-09-15T19:24:44.375230Z INFO text_embeddings_backend_python::management: backends/python/src/management.rs:58: Starting Python backend
2024-09-15T19:24:48.899061Z WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'
2024-09-15T19:24:50.018977Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
return _main(
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 51, in serve
server.serve(model_path, dtype, uds_path)
File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 88, in serve
asyncio.run(serve_inner(model_path, dtype))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 57, in serve_inner
model = get_model(model_path, dtype)
File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 56, in get_model
raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype
Error: Could not create backend
Caused by:
Could not start backend: Python backend failed to start
@ezelanza,
Please share your command of launching the service. I just verified the image opea/tei-gaudi:latest, it works well on my Gaudi2 server.
docker run -p 9780:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host opea/tei-gaudi:latest --model-id $model --pooling cls
It only works on my Gaudi 2 environment separately, but I'm still getting that error when I'm running the docker compose file from cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ docker compose up -d
@ezelanza Could you try following steps in https://github.com/opea-project/GenAIExamples/blob/v1.0/ChatQnA/docker_compose/intel/hpu/gaudi/README.md?
docker compose up -d automatically pulls the necessary docker images on docker hub.
Our support team tested it to be working.
These are the step I followed, and I have the same feedback from multiple devs when they try to run this example. I even tried with a new, fresh instance on IDC, and we got the same errors.
As FYI this is the workaround we have to do to have the example running, we had to separately run each tgi and tei from genaicomps.
For tgi, the compose has
ghcr.io/huggingface/tgi-gaudi:2.0.5
and we ran from GenAIComps (https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/tei/langchain) ghcr.io/huggingface/text-generation-inference:1.4
for tei, the compose has
ghcr.io/huggingface/tei-gaudi:latest
and we ran (from https://github.com/opea-project/GenAIExamples/issues/new?assignees=&labels=&projects=&template=1_bug_template.yml&title=%5BBug%5D)
ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
ValueError: CPU device only supports float32 dtype
=> Check that the invoked container actually includes (writable) Habana devices.
@eero-t this is what I meant that the compose file has to be updated/modified
Hi @eero-t , @ezelanza ,
thank you a lot for raising the problem!
the compose file: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml were updated during the time and corresponding doc:
https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA
we put tei-rerank and tgi-llm on Gaudi if you have two Gaudi cards existing. And tei-embedding on CPU as light workload.
and it works like
1e6064aa38a9 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 6 minutes ago Up 6 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server