Alexey Parfenov
Alexey Parfenov
It seems like `bitsandbytes` can't be used if the model is shared between GPU and CPU. I could not find any info saying that the entire model must be loaded...
When I try to test this low-VRAM method locally, I only get some garbage output. I use the standard "unicorns copypasta" for input. But the output looks like this: `The...
**Describe the bug** Somewhere between d4011d29c623e739e91b842a87fce62a38c6e538 and b4eda619d0674e9ef009702cbd538836c0861a56 the VRAM usage increased dramatically. See below. Tomorrow, I will try to find a certain commit that caused this, but now I'm...
### What happened? Sometimes the part of the initial prompt that should be considered for the penalties is ignored. Only the newly generated tokens are used for calculating penalty. For...