hau
hau
the problem is that the repo itself isn't a valid nextjs app? i have to use the command
sweet will give it a shot On Wed, Mar 15, 2023 at 9:21 AM, aratic < ***@***.*** > wrote: > > >> >> >> Python Bindings for llama.cpp: https:/ /...
awesome! where does the model get held in memory? i have a modern GPU but the inference is still not real-time for me
this is going to break the fucking internet
This part of the code in particular needs some work I think
These errors are just the result of
Any updates here?
Thanks for linking! I'm excited. The main concern I have is for speculative decoding is that latency improvements bounded by the size of the model. Since exllama only seems to...