Doesn't work with m-a-p/OpenCodeInterpreter-DS-33B
Describe the bug
I'm using OI with local models as follows:
interpreter \
-y \
--api_base http:://localhost:8000 \
--model openai/gpt-3.5 \
--context_window 4096 \
--api_key=not_needed \
--local
through a litellm proxy (which convertes openai format (that OI understands) into something that huggingface TGI endpoint understands and back:
litellm --model huggingface/m-a-p/OpenCodeInterpreter-DS-33B --api_base http://localhost:8005 --port 8000
This has worked fine with any huggingface model (llama, mistral, mixtral), but not with the newest m-a-p/OpenCodeInterpreter-DS-33B. OI doesn't generate anything at all, no errors. Any idea if it is a bug?
the TGI server is runs as follows:
export model=huggingface/m-a-p/OpenCodeInterpreter-DS-33B
singularity run \
--nv \
--bind $volume:/data docker://ghcr.io/huggingface/text-generation-inference:1.3 \
--model-id $model \
--port 8005 \
--hostname 0.0.0.0 \
--max-total-tokens 16384 \
--max-input-length 16383 \
--max-batch-prefill-tokens 16383 \
--sharded true \
--num-shard 4
Reproduce
see above
Expected behavior
The OI should generate some output, but nothing happens when I enter a prompt
Screenshots
No response
Open Interpreter version
0.2.0
Python version
3.11
Operating System name and version
CentOS
Additional context
No response
I used the model successfully via LM Studio using the command interpreter --api_base http://localhost:1234/v1. Try using less flags
@MikeBirdTech AFAIK, LMStudio is not designed to handle simultaneous requests from many clients. I have a 4xA100s box on which I run (sharded) models with tensor parallelism (for speed). That's why I'm using TGI instead of LMstudio, which (correct me if I'm wrong) is geared more towards running LLMs on consumer-grade HW (like a macbook) and not to serve an LLM to many clients concurrently.
@RomanKoshkin Sounds like a good setup! Jealous of the speed haha
My comment was moreso to suggest that the model has worked with OI, so the issue is most likely with networking/routing rather than the OI codebase. I'm not familiar with TGI, so unfortunately, I won't be able to provide much assistance. Since you were able to get llama to work, this should follow a similar approach
I'll take a look when I'm home in 4 hours. You might need to use the latest version from git.
Pip install git
Pip install git+https://....
Version 0.2.0 adds openai to the model name in front of huggingface. The lastest from github uses custom llm provider instead.
Closing this stale issue. If this is still a problem with the latest version of Open Interpreter, please alert me to re-open the issue. Thanks!