hau comments

Repositories
Issues
Comments

Results 24 comments of

hau

Fix reuse-port to balance requests across Gunicorn workers

pinging!

updating the repo?

the problem is that the repo itself isn't a valid nextjs app? i have to use the command

python bindings?

sweet will give it a shot On Wed, Mar 15, 2023 at 9:21 AM, aratic < ***@***.*** > wrote: > > >> >> >> Python Bindings for llama.cpp: https:/ /...

speeding up inference

awesome! where does the model get held in memory? i have a modern GPU but the inference is still not real-time for me

speeding up inference

this is going to break the fucking internet

tensorrt error

This part of the code in particular needs some work I think

tensorrt error

These errors are just the result of

Prompt caching

Any updates here?

Speculative decoding?

Thanks for linking! I'm excited. The main concern I have is for speculative decoding is that latency improvements bounded by the size of the model. Since exllama only seems to...