Bird
Bird
> If you don't need to modify those files, you can enable streaming (with `stream_large_files=true`). Yes, I have enabled this option. But the download speed is pretty slow. If possible,...
> Can confirm. Not a ban. Thanks guys. That's great. I will do more tests and delete my comments in this issue to avoid misleading others. In theory, I believe...
> need A100/H100 with train_mem.py, but train with lora (ref to the train_lora.py)is fine. Thanks. I also tried lora, and got another error. It seems that I used it in...
> I think the reason the training sample in flash attention runs correctly is that it's using a fused softmax. We can verify this when adding a print statement to...