open-interpreter icon indicating copy to clipboard operation
open-interpreter copied to clipboard

Budget Manager for LLM(GPT,VertexAI,Falcon,LLama) and others.

Open haseeb-heaven opened this issue 2 years ago • 19 comments

Is your feature request related to a problem? Please describe.

I am addressing a key issue here. Open-interpreter currently lacks transparency when it comes to the costs associated with utilizing various (LLMs). These models have pricing structures that are often based on the number of tokens processed, typically billed per 1000 tokens. The users are left in the dark about how much they're spending so i guess this is. big issue.

Describe the solution you'd like

I want code-interpreter to reach out and grab the pricing details for each LLM from a designated source, like a pricing API or a simple configuration file we keep. Then, every time someone uses an LLM during a session or chat, our app will do some quick math based on how many words or tokens were processed. The result? Users get an instant, user-friendly breakdown of what they're spending, session by session and chat by chat.

Describe alternatives you've considered

One idea was to have someone update the prices manually within code-interpreter every time there's a change. But let's face it, that's asking for trouble—mistakes can happen, and costs might get all mixed up.

Additional context

OpenAI Pricing Vertex AI Pricing

LangChain - Keep track of cost using this LangChain Cost Tracking

We can use LangChain to get Cost per token for model like done here in this GPT Engineer - Issue#706

haseeb-heaven avatar Sep 16 '23 20:09 haseeb-heaven

I believe a similar thing is being worked on currently by @krrishdholakia and would be available soon, thank you for your suggestion!

TanmayDoesAI avatar Sep 16 '23 20:09 TanmayDoesAI

I believe a similar thing is being worked on currently by @krrishdholakia and would be available soon, thank you for your suggestion!

Thats much needed because GPT-4 usage it too high i cannot track without this feature.

haseeb-heaven avatar Sep 16 '23 20:09 haseeb-heaven

Hey @haseeb-heaven - i think you're idea is awesome and more advanced than my pr (it simply lets you set a max budget).

We just exposed a model pricing api https://docs.litellm.ai/docs/max_tokens_cost

that makes it easy to access the litellm community-maintained model tokens + cost mappings

https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

Please let me know if this is helpful in making your PR. If there's any features you feel are missing let me know here or on Discord (https://discord.com/invite/wuPM9dRgDw)

== Re: langchain,

looks like it's only for OpenAI models - https://github.com/langchain-ai/langchain/blob/116cc7998cac563894bdf0037259db767882b55c/libs/langchain/langchain/callbacks/openai_info.py#L7C1-L7C25?

and not actively maintained (last commit >1mo. ago).

Let me know if i'm missing anything.

krrishdholakia avatar Sep 16 '23 23:09 krrishdholakia

Hey @haseeb-heaven - i think you're idea is awesome and more advanced than my pr (it simply lets you set a max budget).

We just exposed a model pricing api https://docs.litellm.ai/docs/max_tokens_cost

that makes it easy to access the litellm community-maintained model tokens + cost mappings

https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

Please let me know if this is helpful in making your PR. If there's any features you feel are missing let me know here or on Discord (https://discord.com/invite/wuPM9dRgDw)

== Re: langchain,

looks like it's only for OpenAI models - https://github.com/langchain-ai/langchain/blob/116cc7998cac563894bdf0037259db767882b55c/libs/langchain/langchain/callbacks/openai_info.py#L7C1-L7C25?

and not actively maintained (last commit >1mo. ago).

Let me know if i'm missing anything.

Yes LiteLLM solved this issues with budget manager, but i have one issue that these prices are hardcoded into API itself and we need to have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price then there would be problem.

And while your LiteLLM solved it but we need to add option into open-interpreter that user will enter and see its api usage directly or kind of budget manager inside the app itself.

So in short there are two issues. 1.Hard-Coded API Pricing. 2.Bugdet manager option inside code-interpreter app itself.

And if there is already PR for these please let me know.

haseeb-heaven avatar Sep 16 '23 23:09 haseeb-heaven

PR: https://github.com/KillianLucas/open-interpreter/pull/316

haseeb-heaven avatar Sep 17 '23 00:09 haseeb-heaven

what does this mean?

have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price

If you're trying to build this - that's awesome.

krrishdholakia avatar Sep 17 '23 00:09 krrishdholakia

what does this mean?

have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price

If you're trying to build this - that's awesome.

Yes that would be great, i will look for all the possible solutions to do that.

haseeb-heaven avatar Sep 17 '23 00:09 haseeb-heaven

I hope I'm not out of place making this suggestion, but since we're talking about predictive models, what would be even nicer, imo, is a predictive cost feature that allows you to write the request, then it goes and gets the current price per token of the model(s) that is being targeted and does a prediction on the cost before you agree to run it. Some requests, for example, might look small in terms of numbers of tokens, but have a big impact in terms of the cost of resolution. For example someone could put in "please show me the complete decimal sequence of Pi." That's a handful of tokens that would, if run to completion be quite costly. What would be great is for OI to do some quick math and say "Excuse me sir, but are you sure you want to run that? The estimated cost will be $100,000,000,000,000,000^23 and take 12.9 quintillion years to calculate." To which you could neatly type "n". Something like that possible? It would be super if it could be done. Might save some folks a bit of heartache.

vbwyrde avatar Sep 17 '23 03:09 vbwyrde

I hope I'm not out of place making this suggestion, but since we're talking about predictive models, what would be even nicer, imo, is a predictive cost feature that allows you to write the request, then it goes and gets the current price per token of the model(s) that is being targeted and does a prediction on the cost before you agree to run it. Some requests, for example, might look small in terms of numbers of tokens, but have a big impact in terms of the cost of resolution. For example someone could put in "please show me the complete decimal sequence of Pi." That's a handful of tokens that would, if run to completion be quite costly. What would be great is for OI to do some quick math and say "Excuse me sir, but are you sure you want to run that? The estimated cost will be $100,000,000,000,000,000^23 and take 12.9 quintillion years to calculate." To which you could neatly type "n". Something like that possible? It would be super if it could be done. Might save some folks a bit of heartache.

You can easily set budget for your API token from OpenAI so no need for in the application i guess because this can be managed from their side but still for checking price which seems higher and costs more can be added to budget manager feature.

haseeb-heaven avatar Sep 17 '23 03:09 haseeb-heaven

@vbwyrde do you mean this? Code: https://github.com/BerriAI/litellm/blob/66a3c59ebe8a1b1fd0799f7dbbeafe18601d1903/litellm/utils.py#L738

Docs: https://docs.litellm.ai/docs/token_usage#3-completion_cost

from litellm import completion, completion_cost

response = completion(
            model="together_ai/togethercomputer/llama-2-70b-chat",
            messages=messages,
            request_timeout=200,
        )
# pass your response from completion to completion_cost
cost = completion_cost(completion_response=response)
formatted_string = f"${float(cost):.10f}"
print(formatted_string)

krrishdholakia avatar Sep 17 '23 03:09 krrishdholakia

Thanks @krrishdholakia ... If I read that correctly what it does is tell you how much it already cost to run a given prompt after the fact (so it tells you how much you spent). Is that right? What I was suggesting is something more predictive... it predicts how much it WILL cost IF you run a given prompt by looking at it and estimating / predicting how much completing the inference will cost. Again, this may be crazy talk, but that's what I was thinking would be a good feature to have handy. So if you have a prompt that you suspect WILL be expensive to complete, or worse you don't realize that it will be expensive, then you get notified that says "Estimated Cost to run this prompt: $50", etc. That way you can elect to not run expensive prompts if you can't afford it. I think with single prompts given token limits there would be an upper bound to cost per prompt, but with Agents that can chunk work up and repeatedly prompt on their own to get results, the ability to predict the Total Estimated Cost might be useful. Again, forgive me if I'm not understanding something. I'm kind of new to all this and may be missing a key piece of information or concept. Thanks.

vbwyrde avatar Sep 17 '23 12:09 vbwyrde

Thanks @krrishdholakia ... If I read that correctly what it does is tell you how much it already cost to run a given prompt after the fact (so it tells you how much you spent). Is that right? What I was suggesting is something more predictive... it predicts how much it WILL cost IF you run a given prompt by looking at it and estimating / predicting how much completing the inference will cost. Again, this may be crazy talk, but that's what I was thinking would be a good feature to have handy. So if you have a prompt that you suspect WILL be expensive to complete, or worse you don't realize that it will be expensive, then you get notified that says "Estimated Cost to run this prompt: $50", etc. That way you can elect to not run expensive prompts if you can't afford it. I think with single prompts given token limits there would be an upper bound to cost per prompt, but with Agents that can chunk work up and repeatedly prompt on their own to get results, the ability to predict the Total Estimated Cost might be useful. Again, forgive me if I'm not understanding something. I'm kind of new to all this and may be missing a key piece of information or concept. Thanks.

Yes i got your concept of adding future predictions of cost of request which is verg hard to determine because its depends upon lots of various factors first we need to add feature of budget manager into interpreter then only we can improve your requested feature in future.

And people mostly use this for small task like . 1.Can you compress this image. 2.Create folder with some data in it.

and small tasks like this, so i don’t think we need to add predictions for right now but will definitely add this in future

haseeb-heaven avatar Sep 17 '23 13:09 haseeb-heaven

Ah, ok, I get ya. Thanks. And yes, I was kind of looking further ahead with my suggestion, for when full blow agents come into play. In those cases, being able to predict costs in advance of requests will be very useful. Anyway, thanks. Glad if I can provide any useful ideas for the project.

vbwyrde avatar Sep 17 '23 17:09 vbwyrde

Ah, ok, I get ya. Thanks. And yes, I was kind of looking further ahead with my suggestion, for when full blow agents come into play. In those cases, being able to predict costs in advance of requests will be very useful. Anyway, thanks. Glad if I can provide any useful ideas for the project.

Thanks for your suggestions keep that coming always good to have more ideas to improve product.

haseeb-heaven avatar Sep 17 '23 17:09 haseeb-heaven

It’s certainly not perfect, and it uses the LiteLLM methods under the hood, but you might want to look at the new %tokens magic command.

It will let you see how many tokens have already been used and their estimated cost, as well as estimating the tokens and cost of a prompt you send to the magic command.

This combined with the max budget functionality gives you some more insight into and control over how you’re using tokens.

ericrallen avatar Oct 24 '23 23:10 ericrallen

@ericrallen any suggestions for how we can improve the underlying functionality?

krrishdholakia avatar Oct 24 '23 23:10 krrishdholakia

@krrishdholakia I think we’re all limited by the fact that there aren’t any available API endpoints from OpenAI (or other providers) that expose the current billing information.

I think where you ended up is about the best we can do for the time being.

ericrallen avatar Oct 24 '23 23:10 ericrallen

@ericrallen i didn't see any mention of the openai billing thing on this issue (might've missed it).

What's the ideal solution here?

krrishdholakia avatar Oct 25 '23 00:10 krrishdholakia

I just meant that hardcoding the cost per token like LiteLLM is doing and then updating it if/when OpenAI announces price changes is the best we can do without OpenAI providing the model pricing in some sort of API we can hit.

ericrallen avatar Oct 25 '23 03:10 ericrallen

max_budget can be set to resolve this. Thanks!

MikeBirdTech avatar Mar 18 '24 19:03 MikeBirdTech