disarmyouwitha
disarmyouwitha
> In my case it was from bitsandbytes . > > When I used the bitsandbytes==0.37.2 version, there was no problem. > > See issues below. > > [TimDettmers/bitsandbytes#324](https://github.com/TimDettmers/bitsandbytes/issues/324) Easy...
> See error report @ [TimDettmers/bitsandbytes#324](https://github.com/TimDettmers/bitsandbytes/issues/324) > > Users previously reported that `pip install bitsandbytes==0.37.2` avoids the OOM issue, albeit it's a pain to install on windows This helped me,...
@KKcorps I see, thank you... Since the adaptor files weren't written properly during checkpoints, I'm guessing that would require retraining after the fix? =x
@bkutasi I have a (very) basic "stateless" API wrapper for exllama that might point you in the right direction: https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py **fast_api.py** is just a FastAPI wrapper around the...
@bkutasi oh hm, I never noticed you had to enable issues - I have opened up the issues tab in my repo if you continue to have problems we can...
I am getting +10 tokens/sec for 7b and 13b models on 4090 and a6000 (ampere) and about the same speed as before for 33b/65
**I have a fork that is really just a set of drop-in scripts for exllama:** https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py **fast_api.py** is just a FastAPI wrapper around the model and generate_simple functions.(currently)...
`gen_begin_reuse()` this is great^^ Replacing `gen_begin()` with this in `generate_simple()` will reuse the cache when having a 1-on-1 conversation while also allowing it to reset if a different conversation is...