DF-Net icon indicating copy to clipboard operation
DF-Net copied to clipboard

Pretraining the network

Open isheetajha opened this issue 7 years ago • 15 comments

Hi @Yuliang-Zou Thanks for sharing the code for the reimplementation. I wanted to replicate the results as mentioned in the paper. Is it possible for you to share the code changes required for 2 frames? Can you also please share the results after pretraining the depth and flow network with 2 frame setup?

isheetajha avatar Dec 17 '18 17:12 isheetajha

@Yuliang-Zou In the kitti_5frame dataset you have train.txt and val.txt files. But I don't find the val.txt being used anywhere. Is that file used anywhere?

roboticsbala avatar Dec 18 '18 16:12 roboticsbala

@isheetajha Due to some system update, I cannot get access to the old code base. I will share the results soon.

Yuliang-Zou avatar Dec 20 '18 03:12 Yuliang-Zou

@roboticsbala You can use val.txt to test the depth estimation performance for model selection. Simply replace test.txt with val.txt

Yuliang-Zou avatar Dec 20 '18 03:12 Yuliang-Zou

@Yuliang-Zou Thanks a lot. Will the losses for depth prediction remain same for pretraining as well as joint training?

isheetajha avatar Dec 21 '18 01:12 isheetajha

@isheetajha I actually used the simple L2 photometric loss for the pre-training, since I found it easier to train the network.

Yuliang-Zou avatar Dec 27 '18 09:12 Yuliang-Zou

@Yuliang-Zou I tried your suggestion of replacing the ternary loss with L2 photometric loss but still pretraining is not happening properly. Even after several iterations the error do not change much. Also, I increased the learning rate to 0.0002. On testing the depth network I get a very high error. Any suggestions would be helpful. abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.4607, 5.0703, 12.4489, 0.5938, 0.0000, 0.2855, 0.5361, 0.7518

Earlier I was getting the following error: abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.4429, 4.7578, 12.0834, 0.5876, 0.0000, 0.3033, 0.560

isheetajha avatar Dec 30 '18 14:12 isheetajha

@isheetajha Did you use val set to pick the best model? Or did you monitor the training progress using tensorboard visualization? The training loss will not decrease obviously in the tensorboard, so you need to monitor the visualization.

Yuliang-Zou avatar Dec 31 '18 04:12 Yuliang-Zou

@Yuliang-Zou I did not use the val set to pick the best model, I just tested all the checkpoint models using split=test. Also, I monitored the training progress using tensorboard. Following are some snapshots. pixel loss 2

total loss 4

smoothness loss 3

isheetajha avatar Jan 03 '19 14:01 isheetajha

@isheetajha The pixel loss curve looks good, but the smoothness loss curve clearly indicates that the training fails (since it decreases to zero quickly, if you observe the predicted depth visualization, you should find it all white).

Actually, you should monitor the visualization to decide if you should early-stop to keep training, sometimes the training will fail due to the randomness of CUDA operations or inappropriate hyperparameters. Let me see if I can find my hyperparameter settings (I am out of town so it might take some time)

Yuliang-Zou avatar Jan 04 '19 02:01 Yuliang-Zou

@Yuliang-Zou Thanks for your reply. If you could share the hyperparameters it would be great.

The smoothness loss very quickly becomes 0 but the pixel loss shows erratic behavior which I was more worried about. Is this expected?

isheetajha avatar Jan 04 '19 14:01 isheetajha

@Yuliang-Zou Is it possible for you to share the hyperparameters for pretraining? I have been experimenting with different smoothness weight(0.5, 1, 1.5). It does train for a while but smoothness loss plummets to zero.

isheetajha avatar Jan 22 '19 12:01 isheetajha

Yes, definitely. I just got back to campus and would take a look during the weekend.

Yuliang-Zou avatar Jan 22 '19 15:01 Yuliang-Zou

@Yuliang-Zou @isheetajha Having the same issue, smoothness loss falls to zero and stays there for depth pretraining. Any idea how to fix that?

shujonnaha avatar Feb 04 '19 14:02 shujonnaha

@Yuliang-Zou @isheetajha @shujonnaha Hi, sorry to bother you guys. I have the same problem, after training the predicted depth map is all white. And I also got the test error like:

root@mygpu:~/DF-Net# python kitti_eval/eval_depth.py --pred_file=./prediction/model-70000.npy --split='test' abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.4499, 4.8663, 12.4314, 0.5961, 0.0000, 0.2985, 0.5540, 0.7693

even though I change the trained models from 10000 to 99999 I got the same test error.

If you have some idea to solve the problem, please reply me. I'm in urgent to reimplement this model. Thanks!

ReekiLee avatar Nov 24 '21 14:11 ReekiLee

@Yuliang-Zou @isheetajha @shujonnaha Hi, sorry to bother you guys. I have the same problem, after training the predicted depth map is all white. And I also got the test error like:

root@mygpu:~/DF-Net# python kitti_eval/eval_depth.py --pred_file=./prediction/model-70000.npy --split='test' abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3 0.4499, 4.8663, 12.4314, 0.5961, 0.0000, 0.2985, 0.5540, 0.7693

even though I change the trained models from 10000 to 99999 I got the same test error.

If you have some idea to solve the problem, please reply me. I'm in urgent to reimplement this model. Thanks!

I've found where the problem is. When I was training, I deleted some lines in the ./dataset/train.txt and I got the results above. After I re-train the model using the un-modified train.txt, I succed.

ReekiLee avatar Nov 29 '21 08:11 ReekiLee