Ashwin Ramesh

Results 11 comments of Ashwin Ramesh

@tridao 1. Does "Backprop on softmax_lse is not supported" mean that that backprop wouldn't work correctly even if I use softmax_lse only and immediately to merge attention with another kv-set's...