kawlil
Results
2
issues of
kawlil
1. What's the rationale behind making the default batch size 64 for the pre-training, continued pre-training, and fine-tuning loops? Others have mentioned that they had to reduce the batch size...
Hi, Firstly, thanks for the awesome work! I want to use KTO with a quantized Mistral model but am getting pickle errors from the multiprocessing thread, probably since that changes...