Farhang Dehzad
Results
1
comments of
Farhang Dehzad
Just wanted to acknowledge I have the same issue with using Fast Attention 2 with phi-2, the training loss hardly decreases with FA2 turned on, and works pretty well with...