tstarshak

Results 2 comments of tstarshak

I think I agree with @lyrgwlr here. Why can't the model learn to ignore the information and just output the correct answer depending on the position of the task?

I've had the same issue. Reducing the learning rate did help, but I'm at 1e-5 with default parameters and 1e-6 with madgrad still gave NaN on loss values. Curious if...