Bird

Results 4 comments of Bird

> If you don't need to modify those files, you can enable streaming (with `stream_large_files=true`). Yes, I have enabled this option. But the download speed is pretty slow. If possible,...

> Can confirm. Not a ban. Thanks guys. That's great. I will do more tests and delete my comments in this issue to avoid misleading others. In theory, I believe...

> need A100/H100 with train_mem.py, but train with lora (ref to the train_lora.py)is fine. Thanks. I also tried lora, and got another error. It seems that I used it in...

> I think the reason the training sample in flash attention runs correctly is that it's using a fused softmax. We can verify this when adding a print statement to...