Alberto Mario Ceballos-Arroyo
Alberto Mario Ceballos-Arroyo
Hi Peter! Sorry for being late with this, the situation in my country (Colombia) has been delicate the last few weeks and I wasn't able to do the MR. In...
I'd like to work on this issue, is there any documentation on adding new models that I should follow?
Hi @kumar-devesh , I'm working on it (made some progress toward getting a working version of the Discrete VAE in Torch) but @osanseviero told me that it would be better...
Hi all! Are we (translators) supposed to just put our info here as a comment? Thx!
I'm having the same issue on Python 3.10, CUDA 12.1 and Torch 2.3.1. If I train without Zero 2/3 the issue goes away but this limits me to training only...
Having the the same issue when using a non-quantized base model and trying to finetune with QLORA int4, getting the following mem comsumption (Ministral 8B, seq length 4096, bsz 1),...
Thanks for the prompt reply @matthewdouglas . If that's the case, I suppose it might be a matter of "savings at scale" since I'm using a single GPU and bsz=1....