open-interpreter Doesn't work with m-a-p/OpenCodeInterpreter-DS-33B

Describe the bug

I'm using OI with local models as follows:

interpreter \
	-y \
	--api_base http:://localhost:8000 \
	--model openai/gpt-3.5 \ 
	--context_window 4096 \ 
	--api_key=not_needed \
	--local

through a litellm proxy (which convertes openai format (that OI understands) into something that huggingface TGI endpoint understands and back:

litellm --model huggingface/m-a-p/OpenCodeInterpreter-DS-33B --api_base http://localhost:8005 --port 8000

This has worked fine with any huggingface model (llama, mistral, mixtral), but not with the newest m-a-p/OpenCodeInterpreter-DS-33B. OI doesn't generate anything at all, no errors. Any idea if it is a bug?

the TGI server is runs as follows:

export model=huggingface/m-a-p/OpenCodeInterpreter-DS-33B
singularity run \
	--nv \
	--bind $volume:/data docker://ghcr.io/huggingface/text-generation-inference:1.3 \
	--model-id $model \
	--port 8005 \
	--hostname 0.0.0.0 \
	--max-total-tokens 16384 \
	--max-input-length 16383 \
	--max-batch-prefill-tokens 16383 \
	--sharded true \
	--num-shard 4

Reproduce

see above

Expected behavior

The OI should generate some output, but nothing happens when I enter a prompt

Screenshots

No response

Open Interpreter version

0.2.0

Python version

3.11

Operating System name and version

CentOS

Additional context

No response

Feb 25 '24 05:02 RomanKoshkin

I used the model successfully via LM Studio using the command interpreter --api_base http://localhost:1234/v1. Try using less flags

Feb 25 '24 16:02 MikeBirdTech

@MikeBirdTech AFAIK, LMStudio is not designed to handle simultaneous requests from many clients. I have a 4xA100s box on which I run (sharded) models with tensor parallelism (for speed). That's why I'm using TGI instead of LMstudio, which (correct me if I'm wrong) is geared more towards running LLMs on consumer-grade HW (like a macbook) and not to serve an LLM to many clients concurrently.

Feb 26 '24 01:02 RomanKoshkin

@RomanKoshkin Sounds like a good setup! Jealous of the speed haha

My comment was moreso to suggest that the model has worked with OI, so the issue is most likely with networking/routing rather than the OI codebase. I'm not familiar with TGI, so unfortunately, I won't be able to provide much assistance. Since you were able to get llama to work, this should follow a similar approach

Feb 26 '24 03:02 MikeBirdTech

I'll take a look when I'm home in 4 hours. You might need to use the latest version from git.

Pip install git
Pip install git+https://....

Version 0.2.0 adds openai to the model name in front of huggingface. The lastest from github uses custom llm provider instead.

Feb 26 '24 11:02 Notnaton

Closing this stale issue. If this is still a problem with the latest version of Open Interpreter, please alert me to re-open the issue. Thanks!

Mar 19 '24 22:03 MikeBirdTech