anterart
anterart
I would also like to have this feature as well 🙂
I also experienced an issue wtih 4.2.1 Translator. The inference with Translator with 4.2.1 produced poor results, I didn't inspect the output itself, I just looked on my metrics which...
@ishaan-jaff It's possible that the first deployment that was added to cooldown is still in cooldown state. I think that the Router should wait until one of the deployments is...
The router can check how much time is left until one of the deployments will stop being in the cooldown state, then exactly when this happens use that deployment. This...
@krrishdholakia I'm working with deployments I have in the Azure OpenAI of gpt4-turbo model. There is a 80 kTPM rate limit on it for me, if I raise the `max_parallel_requests`...
@krrishdholakia @ishaan-jaff In the meantime, maybe can you suggest me. Lets say all the deployments of a model are in cooldown. In this case, I want to know how much...
I would love to get something like that @krrishdholakia. Because currently what I do in case of getting a `RateLimitError,` or a `ValueError` with text `No deployments available for selected...
@ishaan-jaff this exception returns the `cooldown_time` value that I pass when creating the `Router` instance. It's not the updated cooldown time if say `x` seconds passed it will not show...
@ishaan-jaff I checked again, and you're correct, it does return the actual time left until it goes out of cooldown.