[Bug] OPEA ChatQnA RAG example (compose_vllm.yaml) does not work with vllm
Priority
P2-High
OS type
Ubuntu
Hardware type
Xeon-SPR
Installation method
- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source
Deploy method
- [X] Docker compose
- [X] Docker
- [X] Kubernetes
- [X] Helm
Running nodes
Single Node
What's the version?
https://hub.docker.com/r/opea/llm-vllm/tags
Description
OPEA ChatQnA RAG example does not work with vllm.
OPEA ChatQnA with RAG does not consider the retrieved content.
Root Cause: The vllm microservice connector file does not support retrieved content, rather it directly uses the input query to call the vllm model serving.
Reference : https://github.com/opea-project/GenAIComps/blob/main/comps/llms/text-generation/vllm-ray/llm.py
Reproduce steps
- Run the compose file
docker compose -f compose_vllm.yaml up, The application comes up
-
Upload a file
-
Ask specific queries as per the uploaded document from the interface
-
We see that the results does not has the relevant content from the documents.
-
The issue is reproducible in helm , docker and k8s deployment
Raw log
No response
The llm.py link should be this one, not in the vllm-ray directory: https://github.com/opea-project/GenAIComps/blob/main/comps/llms/text-generation/vllm/langchain/llm.py
And this looks like an docker image issue and reassign to owner [email protected]
Fixed by PR https://github.com/opea-project/GenAIComps/pull/687