tstarshak comments

Repositories
Issues
Comments

Results 2 comments of


                                            tstarshak

Could you please elaborate the following line? why not use the y coming from batch?

I think I agree with @lyrgwlr here. Why can't the model learn to ignore the information and just output the correct answer depending on the position of the task?

RuntimeError: hit nan for variance_normalized

I've had the same issue. Reducing the learning rate did help, but I'm at 1e-5 with default parameters and 1e-6 with madgrad still gave NaN on loss values. Curious if...