training log
Hi,
Would you be able to provide the training log especially interested in the point matching loss after few epochs. I'm training it on google compute VM with 4 Nvidia Tesla K80 GPUs but the training speed is about 2.2 frames/sec. The point matching loss is between 10-11 for last few hour (although the flowLoss and the maskLoss is showing a good downward trend).
poch[3] Batch [260] Speed: 2.26 samples/sec Train-Flow_L2Loss=0.321852, Flow_CurLoss=0.000000, PointMatchingLoss=10.396450, MaskLoss=0.108871, Epoch[3] Batch [280] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.309070, Flow_CurLoss=0.000000, PointMatchingLoss=10.542913, MaskLoss=0.109692, Epoch[3] Batch [300] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.309031, Flow_CurLoss=0.000000, PointMatchingLoss=10.530530, MaskLoss=0.109173, Epoch[3] Batch [320] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.299514, Flow_CurLoss=0.000000, PointMatchingLoss=10.616660, MaskLoss=0.108448, Epoch[3] Batch [340] Speed: 2.37 samples/sec Train-Flow_L2Loss=0.291039, Flow_CurLoss=0.000000, PointMatchingLoss=10.707121, MaskLoss=0.107355, Epoch[3] Batch [360] Speed: 2.34 samples/sec Train-Flow_L2Loss=0.280393, Flow_CurLoss=0.000000, PointMatchingLoss=10.794026, MaskLoss=0.107118, Epoch[3] Batch [380] Speed: 2.35 samples/sec Train-Flow_L2Loss=0.276359, Flow_CurLoss=0.202416, PointMatchingLoss=10.857506, MaskLoss=0.110217, Epoch[3] Batch [400] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.265881, Flow_CurLoss=0.000000, PointMatchingLoss=11.151155, MaskLoss=0.109521, Epoch[3] Batch [420] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.256921, Flow_CurLoss=0.000000, PointMatchingLoss=11.267041, MaskLoss=0.110485, Epoch[3] Batch [440] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.250065, Flow_CurLoss=0.000000, PointMatchingLoss=11.396601, MaskLoss=0.111172, Epoch[3] Batch [460] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.244040, Flow_CurLoss=0.000000, PointMatchingLoss=11.543343, MaskLoss=0.110957, Epoch[3] Batch [480] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.239315, Flow_CurLoss=0.000000, PointMatchingLoss=11.643859, MaskLoss=0.114532,
The training finished for train_and_test_deepim_ape.sh. I'm getting some error during the test which I'm debugging.
raceback (most recent call last):
File "experiments/deepim/deepim_train_test.py", line 22, in
I will update once the test runs. Attached are the plots for PointMatchingloss/lr plots, does it match with what you observe?

I'm facing the same kind of behavior. All metrics show a significant decrease while the point matching loss seems to increase at the same time.