kawlil issues

Repositories
Issues
Comments

Results 2 issues of


                                            kawlil

Questions about batch size and clustering model

1. What's the rationale behind making the default batch size 64 for the pre-training, continued pre-training, and fine-tuning loops? Others have mentioned that they had to reduce the batch size...

Compatibility with quantized embeddings

Hi, Firstly, thanks for the awesome work! I want to use KTO with a quantized Mistral model but am getting pickle errors from the multiprocessing thread, probably since that changes...