GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

[Bug]CodeGen sample giving ClientConnectorError in Docker logs

Open arun-gupta opened this issue 1 year ago • 9 comments

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

  • [X] Pull docker images from hub.docker.com
  • [ ] Build docker images from source

Deploy method

  • [X] Docker compose
  • [ ] Docker
  • [ ] Kubernetes
  • [ ] Helm

Running nodes

Single Node

What's the version?

latest

Description

Trying to run CodeGen example following the instructions at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen. This is on a Ubuntu VM running on AWS and using Docker Compose.

Reproduce steps

https://gist.github.com/arun-gupta/5f02b5a57030ba8f975a4c328178ffe8

Raw log

Here is the log of containers starting:


[+] Running 5/5
 ✔ Network ubuntu_default                 Created                                                                                                                         0.1s 
 ✔ Container tgi-service                  Started                                                                                                                         0.3s 
 ✔ Container llm-tgi-server               Started                                                                                                                         0.4s 
 ✔ Container codegen-xeon-backend-server  Started                                                                                                                         0.5s 
 ✔ Container codegen-xeon-ui-server       Started                                                                                                                         0.7s 
ubuntu@ip-172-31-50-223:~$ sudo docker container ls
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS          PORTS                                       NAMES
cf6b5b1e3093   opea/codegen-ui:latest   "docker-entrypoint.s…"   11 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codegen-xeon-ui-server
f254ea57134f   opea/codegen:latest      "python codegen.py"      11 seconds ago   Up 10 seconds   0.0.0.0:7778->7778/tcp, :::7778->7778/tcp   codegen-xeon-backend-server
214a9a1db4b1   opea/llm-tgi:latest      "bash entrypoint.sh"     11 seconds ago   Up 11 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-tgi-server

sudo docker logs llm-tgi-server gives the following:

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 564, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 975, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1350, in _create_direct_connection
    raise last_exc
  File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1319, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1088, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 172.31.50.223:8028 ssl:default [Connect call failed ('172.31.50.223', 8028)]

arun-gupta avatar Sep 18 '24 04:09 arun-gupta

Hi @arun-gupta , the issue is caused by the lack of huggingface authority. Please check the docker log of tgi-service, and you will get the log below. You should visit huggingface and click the Expand to review and access button to ask for access. Then the meta-llama/CodeLlama-7b-hf model will be successfully downloaded.

The README of CodeGen will be updated to explain the token access issue. image

letonghan avatar Sep 19 '24 06:09 letonghan

I used a different model:

export LLM_MODEL_ID="deepseek-ai/deepseek-coder-6.7b-instruct"

Is there still an access issue with that?

arun-gupta avatar Sep 20 '24 00:09 arun-gupta

The deepseek-ai/deepseek-coder-6.7b-instruct model does not have the access isssue, but seems that it's not supported by tgi. Check the supported model of tgi, it only support deepseek v2, no deepseek-coder.

Use command docker logs tgi-service to check the containers log again. The tgi-service container should have the error log like below, and the container is not ready (Exited).

It's recommanded to ask for access and use the official meta-llama/CodeLlama-7b-hf model. This model is well validated.

image

letonghan avatar Sep 20 '24 02:09 letonghan

I followed the instructions explained at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen.

image

Should they be updated accordingly?

arun-gupta avatar Sep 20 '24 03:09 arun-gupta

yes, @yao531441 will help follow this issue and check these models again.

letonghan avatar Sep 20 '24 08:09 letonghan

I followed the instructions explained at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen.

image Should they be updated accordingly?

Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.

yao531441 avatar Sep 20 '24 09:09 yao531441

Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.

Have you tried running the sample and reproducing the error? All other samples seem to be working and this is the one that is causing the problem.

arun-gupta avatar Sep 20 '24 17:09 arun-gupta

Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.

Have you tried running the sample and reproducing the error? All other samples seem to be working and this is the one that is causing the problem.

Of course, I followed your steps and reproduced it. You can see that the tgi service does not exist after you start it. I removed the - d option when run docker compose up and can see logs as follow. image This is because sudo caused inconsistent environment variables and failed to start. I am also curious why your other examples did not report any errors.

yao531441 avatar Sep 23 '24 00:09 yao531441

@arun-gupta Has this issue been resolved?

yao531441 avatar Oct 09 '24 03:10 yao531441

The server log seems to be fine:

ubuntu@ip-172-31-52-197:~$ sudo docker logs llm-tgi-server
Defaulting to user installation because normal site-packages is not writeable
Collecting langserve==0.3.0 (from -r requirements-runtime.txt (line 1))
  Downloading langserve-0.3.0-py3-none-any.whl.metadata (39 kB)
Requirement already satisfied: httpx<1.0,>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.27.2)
Collecting langchain-core<0.4,>=0.3 (from langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading langchain_core-0.3.11-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: orjson<4,>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.10.7)
Requirement already satisfied: pydantic<3.0,>=2.7 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.9.2)
Requirement already satisfied: anyio in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (4.5.0)
Requirement already satisfied: certifi in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2024.8.30)
Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (1.0.5)
Requirement already satisfied: idna in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.10)
Requirement already satisfied: sniffio in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.14.0)
Requirement already satisfied: PyYAML>=5.3 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (6.0.2)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.125 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading langsmith-0.1.135-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: packaging<25,>=23.2 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (24.1)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Requirement already satisfied: typing-extensions>=4.7 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (4.12.2)
Requirement already satisfied: annotated-types>=0.6.0 in /home/user/.local/lib/python3.11/site-packages (from pydantic<3.0,>=2.7->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /home/user/.local/lib/python3.11/site-packages (from pydantic<3.0,>=2.7->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.23.4)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading jsonpointer-3.0.0-py2.py3-none-any.whl.metadata (2.3 kB)
Requirement already satisfied: requests<3,>=2 in /home/user/.local/lib/python3.11/site-packages (from langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.32.3)
Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
  Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/user/.local/lib/python3.11/site-packages (from requests<3,>=2->langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/user/.local/lib/python3.11/site-packages (from requests<3,>=2->langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.2.3)
Downloading langserve-0.3.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 14.7 MB/s eta 0:00:00
Downloading langchain_core-0.3.11-py3-none-any.whl (407 kB)
Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Downloading langsmith-0.1.135-py3-none-any.whl (295 kB)
Downloading tenacity-9.0.0-py3-none-any.whl (28 kB)
Downloading jsonpointer-3.0.0-py2.py3-none-any.whl (7.6 kB)
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Installing collected packages: tenacity, jsonpointer, requests-toolbelt, jsonpatch, langsmith, langchain-core, langserve
  WARNING: The script langsmith is installed in '/home/user/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed jsonpatch-1.33 jsonpointer-3.0.0 langchain-core-0.3.11 langserve-0.3.0 langsmith-0.1.135 requests-toolbelt-1.0.0 tenacity-9.0.0
/home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_name_or_path" in Audio2TextDoc has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
[2024-10-17 01:30:07,955] [    INFO] - Base service - CORS is enabled.
[2024-10-17 01:30:07,956] [    INFO] - Base service - Setting up HTTP server
[2024-10-17 01:30:07,957] [    INFO] - Base service - Uvicorn server setup on port 9000
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
[2024-10-17 01:30:07,959] [    INFO] - Base service - HTTP server setup successful

But now accessing codegen API gives an error:

ubuntu@ip-172-31-52-197:~$ curl http://${host_ip}:7778/v1/codegen     -H "Content-Type: application/json"     -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
Internal Server Error

arun-gupta avatar Oct 17 '24 01:10 arun-gupta

@arun-gupta What I mentioned earlier is tgi-service , not llm-tgi-server, Could you please double check the status of TGI service and post the error message.

yao531441 avatar Oct 18 '24 01:10 yao531441

Closing this due to no updates in the past 56 days. Please feel free to reopen if you continue to experience this issue.

joshuayao avatar Dec 13 '24 02:12 joshuayao