Pavel Filimonov
Results
1
issues of
Pavel Filimonov
Loss becomes nan after training for ~20 steps - loss value stabily decreases and becomes nan with Adam or AdamW optimizers. In case of simple SGD usage it works well.