GenAIExamples [Bug] TEI Gaudi 2 image is failing to launch

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

[X] Pull docker images from hub.docker.com
[ ] Build docker images from source

Deploy method

[X] Docker compose
[ ] Docker
[ ] Kubernetes
[ ] Helm

Running nodes

Single Node

What's the version?

latest

Description

TEI Gaudi image is not lauching due to errors.

Reproduce steps

After running the docker compose the image "opea/tei-gaudi:latest " started but it fails after launching

Raw log

/ChatQnA/docker_compose/intel/hpu/gaudi$ docker logs 349bc3685e97
2024-09-15T19:24:20.318465Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "349bc3685e97", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-09-15T19:24:20.318939Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-09-15T19:24:20.445084Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:45: Downloading `1_Pooling/config.json`
2024-09-15T19:24:21.565035Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:108: Downloading `config_sentence_transformers.json`
2024-09-15T19:24:21.693734Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-09-15T19:24:21.693774Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:22: Downloading `config.json`
2024-09-15T19:24:21.823411Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:25: Downloading `tokenizer.json`
2024-09-15T19:24:22.137401Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:52: Downloading `model.safetensors`
2024-09-15T19:24:43.732689Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:39: Model artifacts downloaded in 22.038954482s
2024-09-15T19:24:44.030320Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-09-15T19:24:44.049828Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:26: Starting 152 tokenization workers
2024-09-15T19:24:44.374782Z  INFO text_embeddings_router: router/src/lib.rs:250: Starting model backend
2024-09-15T19:24:44.375230Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:58: Starting Python backend
2024-09-15T19:24:48.899061Z  WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'

2024-09-15T19:24:50.018977Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
  File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 51, in serve
    server.serve(model_path, dtype, uds_path)
  File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 88, in serve
    asyncio.run(serve_inner(model_path, dtype))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 57, in serve_inner
    model = get_model(model_path, dtype)
  File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 56, in get_model
    raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype

Error: Could not create backend

Caused by:
Could not start backend: Python backend failed to start

Sep 15 '24 19:09 ezelanza

@ezelanza， Please share your command of launching the service. I just verified the image opea/tei-gaudi:latest, it works well on my Gaudi2 server.

docker run -p 9780:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host opea/tei-gaudi:latest --model-id $model --pooling cls

Sep 16 '24 04:09 lvliang-intel

It only works on my Gaudi 2 environment separately, but I'm still getting that error when I'm running the docker compose file from cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ docker compose up -d

Sep 16 '24 07:09 ezelanza

@ezelanza Could you try following steps in https://github.com/opea-project/GenAIExamples/blob/v1.0/ChatQnA/docker_compose/intel/hpu/gaudi/README.md? docker compose up -d automatically pulls the necessary docker images on docker hub. Our support team tested it to be working.

Oct 02 '24 00:10 ctao456

These are the step I followed, and I have the same feedback from multiple devs when they try to run this example. I even tried with a new, fresh instance on IDC, and we got the same errors.

As FYI this is the workaround we have to do to have the example running, we had to separately run each tgi and tei from genaicomps.

For tgi, the compose has ghcr.io/huggingface/tgi-gaudi:2.0.5

and we ran from GenAIComps (https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/tei/langchain) ghcr.io/huggingface/text-generation-inference:1.4

for tei, the compose has ghcr.io/huggingface/tei-gaudi:latest

and we ran (from https://github.com/opea-project/GenAIExamples/issues/new?assignees=&labels=&projects=&template=1_bug_template.yml&title=%5BBug%5D) ghcr.io/huggingface/text-embeddings-inference:cpu-1.5

Oct 02 '24 13:10 ezelanza

ValueError: CPU device only supports float32 dtype

=> Check that the invoked container actually includes (writable) Habana devices.

Oct 02 '24 14:10 eero-t

@eero-t this is what I meant that the compose file has to be updated/modified

Oct 02 '24 14:10 ezelanza

Hi @eero-t , @ezelanza , thank you a lot for raising the problem! the compose file: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml were updated during the time and corresponding doc: https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA we put tei-rerank and tgi-llm on Gaudi if you have two Gaudi cards existing. And tei-embedding on CPU as light workload.
and it works like

1e6064aa38a9 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 6 minutes ago Up 6 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server

Dec 31 '24 06:12 yinghu5