Farhang Dehzad

Results 1 comments of Farhang Dehzad

Just wanted to acknowledge I have the same issue with using Fast Attention 2 with phi-2, the training loss hardly decreases with FA2 turned on, and works pretty well with...