twin
twin
> The user making this has dissapeared from the face of the world. So now somebody else has to do it. I'll check it out. See my pull request [here](https://github.com/TerminalWitchcraft/actix-ratelimit/pull/21)...
Yeah, I'm looking into it but there's a lot of stuff to look through so no promises
Ah gotcha, I'll try the websocket option. Is a slightly more robust dedicated API separate from gradio something that could see a PR acceptance?
Not realizing LMDeploy didn't already support codellama quants, I ended up AWQ quantizing Phind's codellama fine-tune, maybe it can be useful for testing: [poisson-fish/Phind-CodeLlama-34B-v2-AWQ](https://huggingface.co/poisson-fish/Phind-CodeLlama-34B-v2-AWQ) The quantization itself completed successfully with...