More advanced rate limiting for OpenAI completion API
Hello!
Is there a way to dynamically rate-limit a task by using a number set by each task call?
I am trying to implement a task for OpenAI's chat completion API, and they have, on top of a simple Requests Per Minute limit, a Tokens Per Minute limit.
Here's an example from their cookbooks: https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py
Basically the rate limiter has to track two variables for each model (key of the rate limiter).
An added complication is that the task needs to estimate how many token the API call will consume (they track output tokens too), but I believe this can't and shouldn't be handled by this library.
I guess this can be split in two features:
- multiple rate limiters on same task (e.g. Requests per Minute AND Requests per Day)
- generalization of the "counter" (for RPM the counter increases by 1, a generalized version would increase the counter depending on the result of a lambda)
Do you think these features would be useful or are they outside of the scope this library?
https://github.com/TkTech/celery-heimdall/blob/b6709f636b77079e396bd704317f925f9b407381/celery_heimdall/task.py#L121-L131
This looks like the place where we could implement the generalization of the counter.
https://github.com/TkTech/celery-heimdall/blob/b6709f636b77079e396bd704317f925f9b407381/celery_heimdall/task.py#L149-L151
I don't know how the new delay would be computed though..
The RateLimit object can accept a callable instead of a tuple, which can be given the task, the key, the args, and the kwargs, all optionally. You should be able to use this to calculate a dynamic rate limit for the task on each call. Keep in mind it's new and not thoroughly tested yet.