disarmyouwitha comments

Results 9 comments of


                                            disarmyouwitha

"Save every n steps" in training cause an CUDA out of memory

> In my case it was from bitsandbytes . > > When I used the bitsandbytes==0.37.2 version, there was no problem. > > See issues below. > > [TimDettmers/bitsandbytes#324](https://github.com/TimDettmers/bitsandbytes/issues/324) Easy...

WebUI doesn't start after first installation. json.decoder.JSONDecodeError

same

Lora Training fails to save checkpoint

> See error report @ [TimDettmers/bitsandbytes#324](https://github.com/TimDettmers/bitsandbytes/issues/324) > > Users previously reported that `pip install bitsandbytes==0.37.2` avoids the OOM issue, albeit it's a pain to install on windows This helped me,...

Model finished training, but adapter_model.bin is empty?

@KKcorps I see, thank you... Since the adaptor files weren't written properly during checkpoints, I'm guessing that would require retraining after the fix? =x

Streaming API

@bkutasi I have a (very) basic "stateless" API wrapper for exllama that might point you in the right direction: https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py **fast_api.py** is just a FastAPI wrapper around the...

Streaming API

@bkutasi oh hm, I never noticed you had to enable issues - I have opened up the issues tab in my repo if you continue to have problems we can...

Performance degradation

I am getting +10 tokens/sec for 7b and 13b models on 4090 and a6000 (ampere) and about the same speed as before for 33b/65

ExLlama API spec / discussion

**I have a fork that is really just a set of drop-in scripts for exllama:** https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py **fast_api.py** is just a FastAPI wrapper around the model and generate_simple functions.(currently)...

ExLlama API spec / discussion

`gen_begin_reuse()` this is great^^ Replacing `gen_begin()` with this in `generate_simple()` will reuse the cache when having a 1-on-1 conversation while also allowing it to reset if a different conversation is...