Ashwin Ramesh
Results
11
comments of
Ashwin Ramesh
@tridao 1. Does "Backprop on softmax_lse is not supported" mean that that backprop wouldn't work correctly even if I use softmax_lse only and immediately to merge attention with another kv-set's...