mx-DeepIM training log

Hi,

Would you be able to provide the training log especially interested in the point matching loss after few epochs. I'm training it on google compute VM with 4 Nvidia Tesla K80 GPUs but the training speed is about 2.2 frames/sec. The point matching loss is between 10-11 for last few hour (although the flowLoss and the maskLoss is showing a good downward trend).

poch[3] Batch [260] Speed: 2.26 samples/sec Train-Flow_L2Loss=0.321852, Flow_CurLoss=0.000000, PointMatchingLoss=10.396450, MaskLoss=0.108871, Epoch[3] Batch [280] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.309070, Flow_CurLoss=0.000000, PointMatchingLoss=10.542913, MaskLoss=0.109692, Epoch[3] Batch [300] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.309031, Flow_CurLoss=0.000000, PointMatchingLoss=10.530530, MaskLoss=0.109173, Epoch[3] Batch [320] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.299514, Flow_CurLoss=0.000000, PointMatchingLoss=10.616660, MaskLoss=0.108448, Epoch[3] Batch [340] Speed: 2.37 samples/sec Train-Flow_L2Loss=0.291039, Flow_CurLoss=0.000000, PointMatchingLoss=10.707121, MaskLoss=0.107355, Epoch[3] Batch [360] Speed: 2.34 samples/sec Train-Flow_L2Loss=0.280393, Flow_CurLoss=0.000000, PointMatchingLoss=10.794026, MaskLoss=0.107118, Epoch[3] Batch [380] Speed: 2.35 samples/sec Train-Flow_L2Loss=0.276359, Flow_CurLoss=0.202416, PointMatchingLoss=10.857506, MaskLoss=0.110217, Epoch[3] Batch [400] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.265881, Flow_CurLoss=0.000000, PointMatchingLoss=11.151155, MaskLoss=0.109521, Epoch[3] Batch [420] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.256921, Flow_CurLoss=0.000000, PointMatchingLoss=11.267041, MaskLoss=0.110485, Epoch[3] Batch [440] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.250065, Flow_CurLoss=0.000000, PointMatchingLoss=11.396601, MaskLoss=0.111172, Epoch[3] Batch [460] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.244040, Flow_CurLoss=0.000000, PointMatchingLoss=11.543343, MaskLoss=0.110957, Epoch[3] Batch [480] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.239315, Flow_CurLoss=0.000000, PointMatchingLoss=11.643859, MaskLoss=0.114532,

May 26 '19 07:05 vshk42

The training finished for train_and_test_deepim_ape.sh. I'm getting some error during the test which I'm debugging. raceback (most recent call last): File "experiments/deepim/deepim_train_test.py", line 22, in test.main() File "experiments/deepim/../../deepim/test.py", line 210, in main test_deepim() File "experiments/deepim/../../deepim/test.py", line 203, in test_deepim pairdb=pairdb, File "experiments/deepim/../../deepim/core/tester.py", line 590, in pred_eval data_batch = update_data_batch(config, data_batch, update_package) File "experiments/deepim/../../lib/pair_matching/data_pair.py", line 79, in update_data_batch package = update_package[ctx_idx] IndexError: list index out of range

I will update once the test runs. Attached are the plots for PointMatchingloss/lr plots, does it match with what you observe?

May 26 '19 18:05 vshk42

I'm facing the same kind of behavior. All metrics show a significant decrease while the point matching loss seems to increase at the same time.

Jun 11 '19 08:06 huberl