Alex Peng

Results 2 issues of Alex Peng

During training, do you load all the data at once before starting, or do you load the data in batches as needed? From the log file, there are tons of...

Seems an infinite waiting time for Flash-attn 2.5.8 ![Image](https://github.com/user-attachments/assets/3f2cc46c-1f52-4c84-9b83-7b79c34eacd1) But it works for 2.7.4 ![Image](https://github.com/user-attachments/assets/d6cf8625-6a34-45de-9e1d-48debde8d80c)