[Bug]CodeGen sample giving ClientConnectorError in Docker logs
Priority
Undecided
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source
Deploy method
- [X] Docker compose
- [ ] Docker
- [ ] Kubernetes
- [ ] Helm
Running nodes
Single Node
What's the version?
latest
Description
Trying to run CodeGen example following the instructions at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen. This is on a Ubuntu VM running on AWS and using Docker Compose.
Reproduce steps
https://gist.github.com/arun-gupta/5f02b5a57030ba8f975a4c328178ffe8
Raw log
Here is the log of containers starting:
[+] Running 5/5
✔ Network ubuntu_default Created 0.1s
✔ Container tgi-service Started 0.3s
✔ Container llm-tgi-server Started 0.4s
✔ Container codegen-xeon-backend-server Started 0.5s
✔ Container codegen-xeon-ui-server Started 0.7s
ubuntu@ip-172-31-50-223:~$ sudo docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cf6b5b1e3093 opea/codegen-ui:latest "docker-entrypoint.s…" 11 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-xeon-ui-server
f254ea57134f opea/codegen:latest "python codegen.py" 11 seconds ago Up 10 seconds 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-xeon-backend-server
214a9a1db4b1 opea/llm-tgi:latest "bash entrypoint.sh" 11 seconds ago Up 11 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
sudo docker logs llm-tgi-server gives the following:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 564, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 975, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1350, in _create_direct_connection
raise last_exc
File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1319, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/aiohttp/connector.py", line 1088, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 172.31.50.223:8028 ssl:default [Connect call failed ('172.31.50.223', 8028)]
Hi @arun-gupta , the issue is caused by the lack of huggingface authority. Please check the docker log of tgi-service, and you will get the log below.
You should visit huggingface and click the Expand to review and access button to ask for access. Then the meta-llama/CodeLlama-7b-hf model will be successfully downloaded.
The README of CodeGen will be updated to explain the token access issue.
I used a different model:
export LLM_MODEL_ID="deepseek-ai/deepseek-coder-6.7b-instruct"
Is there still an access issue with that?
The deepseek-ai/deepseek-coder-6.7b-instruct model does not have the access isssue, but seems that it's not supported by tgi. Check the supported model of tgi, it only support deepseek v2, no deepseek-coder.
Use command docker logs tgi-service to check the containers log again. The tgi-service container should have the error log like below, and the container is not ready (Exited).
It's recommanded to ask for access and use the official meta-llama/CodeLlama-7b-hf model. This model is well validated.
I followed the instructions explained at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen.
Should they be updated accordingly?
yes, @yao531441 will help follow this issue and check these models again.
I followed the instructions explained at https://github.com/opea-project/GenAIExamples/tree/main/CodeGen.
Should they be updated accordingly?
Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.
Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.
Have you tried running the sample and reproducing the error? All other samples seem to be working and this is the one that is causing the problem.
Hi, @arun-gupta When you use sudo to execute docker compose up, the environment variables may not be effectively passed, which will cause the tgi service to fail to start. Please try removing sudo and try again.
Have you tried running the sample and reproducing the error? All other samples seem to be working and this is the one that is causing the problem.
Of course, I followed your steps and reproduced it. You can see that the tgi service does not exist after you start it. I removed the - d option when run docker compose up and can see logs as follow.
This is because sudo caused inconsistent environment variables and failed to start. I am also curious why your other examples did not report any errors.
@arun-gupta Has this issue been resolved?
The server log seems to be fine:
ubuntu@ip-172-31-52-197:~$ sudo docker logs llm-tgi-server
Defaulting to user installation because normal site-packages is not writeable
Collecting langserve==0.3.0 (from -r requirements-runtime.txt (line 1))
Downloading langserve-0.3.0-py3-none-any.whl.metadata (39 kB)
Requirement already satisfied: httpx<1.0,>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.27.2)
Collecting langchain-core<0.4,>=0.3 (from langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading langchain_core-0.3.11-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: orjson<4,>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.10.7)
Requirement already satisfied: pydantic<3.0,>=2.7 in /home/user/.local/lib/python3.11/site-packages (from langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.9.2)
Requirement already satisfied: anyio in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (4.5.0)
Requirement already satisfied: certifi in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2024.8.30)
Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (1.0.5)
Requirement already satisfied: idna in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.10)
Requirement already satisfied: sniffio in /home/user/.local/lib/python3.11/site-packages (from httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx<1.0,>=0.23.0->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.14.0)
Requirement already satisfied: PyYAML>=5.3 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (6.0.2)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.125 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading langsmith-0.1.135-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: packaging<25,>=23.2 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (24.1)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Requirement already satisfied: typing-extensions>=4.7 in /home/user/.local/lib/python3.11/site-packages (from langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (4.12.2)
Requirement already satisfied: annotated-types>=0.6.0 in /home/user/.local/lib/python3.11/site-packages (from pydantic<3.0,>=2.7->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /home/user/.local/lib/python3.11/site-packages (from pydantic<3.0,>=2.7->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.23.4)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading jsonpointer-3.0.0-py2.py3-none-any.whl.metadata (2.3 kB)
Requirement already satisfied: requests<3,>=2 in /home/user/.local/lib/python3.11/site-packages (from langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.32.3)
Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1))
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/user/.local/lib/python3.11/site-packages (from requests<3,>=2->langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/user/.local/lib/python3.11/site-packages (from requests<3,>=2->langsmith<0.2.0,>=0.1.125->langchain-core<0.4,>=0.3->langserve==0.3.0->-r requirements-runtime.txt (line 1)) (2.2.3)
Downloading langserve-0.3.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 14.7 MB/s eta 0:00:00
Downloading langchain_core-0.3.11-py3-none-any.whl (407 kB)
Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Downloading langsmith-0.1.135-py3-none-any.whl (295 kB)
Downloading tenacity-9.0.0-py3-none-any.whl (28 kB)
Downloading jsonpointer-3.0.0-py2.py3-none-any.whl (7.6 kB)
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Installing collected packages: tenacity, jsonpointer, requests-toolbelt, jsonpatch, langsmith, langchain-core, langserve
WARNING: The script langsmith is installed in '/home/user/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed jsonpatch-1.33 jsonpointer-3.0.0 langchain-core-0.3.11 langserve-0.3.0 langsmith-0.1.135 requests-toolbelt-1.0.0 tenacity-9.0.0
/home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_name_or_path" in Audio2TextDoc has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
[2024-10-17 01:30:07,955] [ INFO] - Base service - CORS is enabled.
[2024-10-17 01:30:07,956] [ INFO] - Base service - Setting up HTTP server
[2024-10-17 01:30:07,957] [ INFO] - Base service - Uvicorn server setup on port 9000
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
[2024-10-17 01:30:07,959] [ INFO] - Base service - HTTP server setup successful
But now accessing codegen API gives an error:
ubuntu@ip-172-31-52-197:~$ curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
Internal Server Error
@arun-gupta What I mentioned earlier is tgi-service , not llm-tgi-server, Could you please double check the status of TGI service and post the error message.
Closing this due to no updates in the past 56 days. Please feel free to reopen if you continue to experience this issue.
Should they be updated accordingly?