Loss not decreasing on default config settings
@rowanz Hi, I am trying to train the model from scratch, but am not able to reproduce the actual results. Specifically the loss is not decreasing in each epoch. I ran it for 20 epochs and the results are below. Anyone faced such an issue or know the possible reason for this? Any kind of suggestions will be of great help. Thank you.
TRAIN EPOCH 0: loss 1.356284 crl 0.144345 accuracy 0.311996 sec_per_batch 1.702358 hr_per_epoch 1.048369 dtype: float64
Val epoch 0 has acc 0.249 and loss 1.386 Best validation performance so far. Copying weights to 'saves/flagship_rationale/best.th'.
TRAIN EPOCH 1: loss 1.386393 crl 0.089470 accuracy 0.249471 sec_per_batch 2.008696 hr_per_epoch 1.237022 dtype: float64
Val epoch 1 has acc 0.249 and loss 1.386
TRAIN EPOCH 2: loss 1.386381 crl 0.075422 accuracy 0.251220 sec_per_batch 1.946174 hr_per_epoch 1.198519 dtype: float64
Epoch 2: reducing learning rate of group 0 to 1.0000e-04. Val epoch 2 has acc 0.249 and loss 1.386
TRAIN EPOCH 3: loss 1.386379 crl 0.050537 accuracy 0.248640 sec_per_batch 1.870728 hr_per_epoch 1.152057 dtype: float64
Val epoch 3 has acc 0.249 and loss 1.386
TRAIN EPOCH 4: loss 1.386330 crl 0.042339 accuracy 0.250779 sec_per_batch 2.006369 hr_per_epoch 1.235589 dtype: float64
Val epoch 4 has acc 0.249 and loss 1.386
TRAIN EPOCH 5: loss 1.386332 crl 0.037035 accuracy 0.250581 sec_per_batch 1.735174 hr_per_epoch 1.068578 dtype: float64
Val epoch 5 has acc 0.249 and loss 1.386
TRAIN EPOCH 6: loss 1.386333 crl 0.032566 accuracy 0.249394 sec_per_batch 2.384569 hr_per_epoch 1.468497 dtype: float64
Epoch 6: reducing learning rate of group 0 to 5.0000e-05. Val epoch 6 has acc 0.249 and loss 1.386
TRAIN EPOCH 7: loss 1.386345 crl 0.020694 accuracy 0.247829 sec_per_batch 2.088539 hr_per_epoch 1.286192 dtype: float64
Val epoch 7 has acc 0.249 and loss 1.386
TRAIN EPOCH 8: loss 1.386309 crl 0.017643 accuracy 0.251004 sec_per_batch 1.965981 hr_per_epoch 1.210717 dtype: float64
Val epoch 8 has acc 0.249 and loss 1.386
TRAIN EPOCH 9: loss 1.386299 crl 0.015537 accuracy 0.251415 sec_per_batch 1.872479 hr_per_epoch 1.153135 dtype: float64
Val epoch 9 has acc 0.249 and loss 1.386
TRAIN EPOCH 10: loss 1.386302 crl 0.014494 accuracy 0.251420 sec_per_batch 1.644809 hr_per_epoch 1.012928 dtype: float64
Epoch 10: reducing learning rate of group 0 to 2.5000e-05. Val epoch 10 has acc 0.249 and loss 1.386
TRAIN EPOCH 11: loss 1.386306 crl 0.009551 accuracy 0.252025 sec_per_batch 1.408009 hr_per_epoch 0.867099 dtype: float64
Val epoch 11 has acc 0.249 and loss 1.386
TRAIN EPOCH 12: loss 1.386314 crl 0.007876 accuracy 0.250382 sec_per_batch 1.419217 hr_per_epoch 0.874001 dtype: float64
Val epoch 12 has acc 0.249 and loss 1.386
TRAIN EPOCH 13: loss 1.386337 crl 0.007333 accuracy 0.248957 sec_per_batch 1.800047 hr_per_epoch 1.108529 dtype: float64
Val epoch 13 has acc 0.249 and loss 1.386
TRAIN EPOCH 14: loss 1.386308 crl 0.006972 accuracy 0.251202 sec_per_batch 1.691500 hr_per_epoch 1.041682 dtype: float64
Epoch 14: reducing learning rate of group 0 to 1.2500e-05. Val epoch 14 has acc 0.249 and loss 1.386
TRAIN EPOCH 15: loss 1.386294 crl 0.004941 accuracy 0.250033 sec_per_batch 1.976553 hr_per_epoch 1.217227 dtype: float64
Val epoch 15 has acc 0.249 and loss 1.386
TRAIN EPOCH 16: loss 1.386299 crl 0.004361 accuracy 0.250594 sec_per_batch 2.385966 hr_per_epoch 1.469357 dtype: float64
Val epoch 16 has acc 0.249 and loss 1.386
TRAIN EPOCH 17: loss 1.386329 crl 0.004206 accuracy 0.249658 sec_per_batch 2.463118 hr_per_epoch 1.516870 dtype: float64
Val epoch 17 has acc 0.249 and loss 1.386
TRAIN EPOCH 18: loss 1.386311 crl 0.003819 accuracy 0.249090 sec_per_batch 2.041939 hr_per_epoch 1.257494 dtype: float64
Epoch 18: reducing learning rate of group 0 to 6.2500e-06. Val epoch 18 has acc 0.249 and loss 1.386
TRAIN EPOCH 19: loss 1.386334 crl 0.003092 accuracy 0.249248 sec_per_batch 1.784414 hr_per_epoch 1.098902 dtype: float64
Val epoch 19 has acc 0.249 and loss 1.386