Budget Manager for LLM(GPT,VertexAI,Falcon,LLama) and others.
Is your feature request related to a problem? Please describe.
I am addressing a key issue here. Open-interpreter currently lacks transparency when it comes to the costs associated with utilizing various (LLMs). These models have pricing structures that are often based on the number of tokens processed, typically billed per 1000 tokens. The users are left in the dark about how much they're spending so i guess this is. big issue.
Describe the solution you'd like
I want code-interpreter to reach out and grab the pricing details for each LLM from a designated source, like a pricing API or a simple configuration file we keep. Then, every time someone uses an LLM during a session or chat, our app will do some quick math based on how many words or tokens were processed. The result? Users get an instant, user-friendly breakdown of what they're spending, session by session and chat by chat.
Describe alternatives you've considered
One idea was to have someone update the prices manually within code-interpreter every time there's a change. But let's face it, that's asking for trouble—mistakes can happen, and costs might get all mixed up.
Additional context
OpenAI Pricing Vertex AI Pricing
LangChain - Keep track of cost using this LangChain Cost Tracking
We can use LangChain to get Cost per token for model like done here in this GPT Engineer - Issue#706
I believe a similar thing is being worked on currently by @krrishdholakia and would be available soon, thank you for your suggestion!
I believe a similar thing is being worked on currently by @krrishdholakia and would be available soon, thank you for your suggestion!
Thats much needed because GPT-4 usage it too high i cannot track without this feature.
Hey @haseeb-heaven - i think you're idea is awesome and more advanced than my pr (it simply lets you set a max budget).
We just exposed a model pricing api https://docs.litellm.ai/docs/max_tokens_cost
that makes it easy to access the litellm community-maintained model tokens + cost mappings
https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
Please let me know if this is helpful in making your PR. If there's any features you feel are missing let me know here or on Discord (https://discord.com/invite/wuPM9dRgDw)
== Re: langchain,
looks like it's only for OpenAI models - https://github.com/langchain-ai/langchain/blob/116cc7998cac563894bdf0037259db767882b55c/libs/langchain/langchain/callbacks/openai_info.py#L7C1-L7C25?
and not actively maintained (last commit >1mo. ago).
Let me know if i'm missing anything.
Hey @haseeb-heaven - i think you're idea is awesome and more advanced than my pr (it simply lets you set a max budget).
We just exposed a model pricing api https://docs.litellm.ai/docs/max_tokens_cost
that makes it easy to access the litellm community-maintained model tokens + cost mappings
https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
Please let me know if this is helpful in making your PR. If there's any features you feel are missing let me know here or on Discord (https://discord.com/invite/wuPM9dRgDw)
== Re: langchain,
looks like it's only for OpenAI models - https://github.com/langchain-ai/langchain/blob/116cc7998cac563894bdf0037259db767882b55c/libs/langchain/langchain/callbacks/openai_info.py#L7C1-L7C25?
and not actively maintained (last commit >1mo. ago).
Let me know if i'm missing anything.
Yes LiteLLM solved this issues with budget manager, but i have one issue that these prices are hardcoded into API itself and we need to have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price then there would be problem.
And while your LiteLLM solved it but we need to add option into open-interpreter that user will enter and see its api usage directly or kind of budget manager inside the app itself.
So in short there are two issues.
1.Hard-Coded API Pricing.
2.Bugdet manager option inside code-interpreter app itself.
And if there is already PR for these please let me know.
PR: https://github.com/KillianLucas/open-interpreter/pull/316
what does this mean?
have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price
If you're trying to build this - that's awesome.
what does this mean?
have tracking mechanism which tracks if in future OpenAI or VertexAI or any other LLM's changes their Price
If you're trying to build this - that's awesome.
Yes that would be great, i will look for all the possible solutions to do that.
I hope I'm not out of place making this suggestion, but since we're talking about predictive models, what would be even nicer, imo, is a predictive cost feature that allows you to write the request, then it goes and gets the current price per token of the model(s) that is being targeted and does a prediction on the cost before you agree to run it. Some requests, for example, might look small in terms of numbers of tokens, but have a big impact in terms of the cost of resolution. For example someone could put in "please show me the complete decimal sequence of Pi." That's a handful of tokens that would, if run to completion be quite costly. What would be great is for OI to do some quick math and say "Excuse me sir, but are you sure you want to run that? The estimated cost will be $100,000,000,000,000,000^23 and take 12.9 quintillion years to calculate." To which you could neatly type "n". Something like that possible? It would be super if it could be done. Might save some folks a bit of heartache.
I hope I'm not out of place making this suggestion, but since we're talking about predictive models, what would be even nicer, imo, is a predictive cost feature that allows you to write the request, then it goes and gets the current price per token of the model(s) that is being targeted and does a prediction on the cost before you agree to run it. Some requests, for example, might look small in terms of numbers of tokens, but have a big impact in terms of the cost of resolution. For example someone could put in "please show me the complete decimal sequence of Pi." That's a handful of tokens that would, if run to completion be quite costly. What would be great is for OI to do some quick math and say "Excuse me sir, but are you sure you want to run that? The estimated cost will be $100,000,000,000,000,000^23 and take 12.9 quintillion years to calculate." To which you could neatly type "n". Something like that possible? It would be super if it could be done. Might save some folks a bit of heartache.
You can easily set budget for your API token from OpenAI so no need for in the application i guess because this can be managed from their side but still for checking price which seems higher and costs more can be added to budget manager feature.
@vbwyrde do you mean this? Code: https://github.com/BerriAI/litellm/blob/66a3c59ebe8a1b1fd0799f7dbbeafe18601d1903/litellm/utils.py#L738
Docs: https://docs.litellm.ai/docs/token_usage#3-completion_cost
from litellm import completion, completion_cost
response = completion(
model="together_ai/togethercomputer/llama-2-70b-chat",
messages=messages,
request_timeout=200,
)
# pass your response from completion to completion_cost
cost = completion_cost(completion_response=response)
formatted_string = f"${float(cost):.10f}"
print(formatted_string)
Thanks @krrishdholakia ... If I read that correctly what it does is tell you how much it already cost to run a given prompt after the fact (so it tells you how much you spent). Is that right? What I was suggesting is something more predictive... it predicts how much it WILL cost IF you run a given prompt by looking at it and estimating / predicting how much completing the inference will cost. Again, this may be crazy talk, but that's what I was thinking would be a good feature to have handy. So if you have a prompt that you suspect WILL be expensive to complete, or worse you don't realize that it will be expensive, then you get notified that says "Estimated Cost to run this prompt: $50", etc. That way you can elect to not run expensive prompts if you can't afford it. I think with single prompts given token limits there would be an upper bound to cost per prompt, but with Agents that can chunk work up and repeatedly prompt on their own to get results, the ability to predict the Total Estimated Cost might be useful. Again, forgive me if I'm not understanding something. I'm kind of new to all this and may be missing a key piece of information or concept. Thanks.
Thanks @krrishdholakia ... If I read that correctly what it does is tell you how much it already cost to run a given prompt after the fact (so it tells you how much you spent). Is that right? What I was suggesting is something more predictive... it predicts how much it WILL cost IF you run a given prompt by looking at it and estimating / predicting how much completing the inference will cost. Again, this may be crazy talk, but that's what I was thinking would be a good feature to have handy. So if you have a prompt that you suspect WILL be expensive to complete, or worse you don't realize that it will be expensive, then you get notified that says "Estimated Cost to run this prompt: $50", etc. That way you can elect to not run expensive prompts if you can't afford it. I think with single prompts given token limits there would be an upper bound to cost per prompt, but with Agents that can chunk work up and repeatedly prompt on their own to get results, the ability to predict the Total Estimated Cost might be useful. Again, forgive me if I'm not understanding something. I'm kind of new to all this and may be missing a key piece of information or concept. Thanks.
Yes i got your concept of adding future predictions of cost of request which is verg hard to determine because its depends upon lots of various factors first we need to add feature of budget manager into interpreter then only we can improve your requested feature in future.
And people mostly use this for small task like . 1.Can you compress this image. 2.Create folder with some data in it.
and small tasks like this, so i don’t think we need to add predictions for right now but will definitely add this in future
Ah, ok, I get ya. Thanks. And yes, I was kind of looking further ahead with my suggestion, for when full blow agents come into play. In those cases, being able to predict costs in advance of requests will be very useful. Anyway, thanks. Glad if I can provide any useful ideas for the project.
Ah, ok, I get ya. Thanks. And yes, I was kind of looking further ahead with my suggestion, for when full blow agents come into play. In those cases, being able to predict costs in advance of requests will be very useful. Anyway, thanks. Glad if I can provide any useful ideas for the project.
Thanks for your suggestions keep that coming always good to have more ideas to improve product.
It’s certainly not perfect, and it uses the LiteLLM methods under the hood, but you might want to look at the new %tokens magic command.
It will let you see how many tokens have already been used and their estimated cost, as well as estimating the tokens and cost of a prompt you send to the magic command.
This combined with the max budget functionality gives you some more insight into and control over how you’re using tokens.
@ericrallen any suggestions for how we can improve the underlying functionality?
@krrishdholakia I think we’re all limited by the fact that there aren’t any available API endpoints from OpenAI (or other providers) that expose the current billing information.
I think where you ended up is about the best we can do for the time being.
@ericrallen i didn't see any mention of the openai billing thing on this issue (might've missed it).
What's the ideal solution here?
I just meant that hardcoding the cost per token like LiteLLM is doing and then updating it if/when OpenAI announces price changes is the best we can do without OpenAI providing the model pricing in some sort of API we can hit.
max_budget can be set to resolve this. Thanks!