Pavel Filimonov

Results 1 issues of Pavel Filimonov

Loss becomes nan after training for ~20 steps - loss value stabily decreases and becomes nan with Adam or AdamW optimizers. In case of simple SGD usage it works well.