irr icon indicating copy to clipboard operation
irr copied to clipboard

could you provide the training script for the other datasets of pwcnet?

Open minghuiwsw opened this issue 3 years ago • 11 comments

hi, @hurjunhwa , I noticed you provide the training script for the baseline: pwcnet. But I wonder if it is for flycharis_occ dataset only or it's applicable to other datasets. I mean, if I want to train pwcnet for other datasets to achieve the expected performance in paper, do I need to change the hypermeters in the config file of pwcnet.sh? If yes, can you provide me with those files? Thanks a lot!

minghuiwsw avatar Apr 26 '23 12:04 minghuiwsw

Yes, it's possible to train on other datasets.

Please specify the directory of the custom dataset here as well as its name. Write a custom dataset file here and define the name of the dataset here.

Maybe it's easier to look at the existing example and start from there.

hurjunhwa avatar Apr 27 '23 01:04 hurjunhwa

Thanks a lot! Actually I tried exactly as you said yesterday, and the training is on going. Besides, I have another question. If I want to use another gpu for training instead of the default cuda0, what should I do? I tried to add --cuda 5 in the command line but it did not work. And what should I do if I want to use multiple gpus for training? I noticed you annotated some codes in main.py line 47 are they for that?

minghuiwsw avatar Apr 27 '23 01:04 minghuiwsw

https://github.com/visinf/irr/blob/dacd07b1dc963fb8d3db7c75b562691af33f47b2/main.py#L47 it is here

minghuiwsw avatar Apr 27 '23 01:04 minghuiwsw

Yes, you could uncomment those lines

https://github.com/visinf/irr/blob/dacd07b1dc963fb8d3db7c75b562691af33f47b2/main.py#L47-L53

and run the script with CUDA_VISIBLE_DEVICES. If you would like to use 4 GPUs in your machine, the command would be: CUDA_VISIBLE_DEVICES=0, 1, 2, 3 IRR-FlowNet_flyingChairsOcc.sh

hurjunhwa avatar Apr 30 '23 18:04 hurjunhwa

Thanks! But I have another two questions. 1.Currently I am reproducing your pwc-irr network and training it with Sintel dataset. There is result for Sintel dataset in your paper. But in your provided code I find that there is only training script for flow_occ_v5. Considering that in other model like IRR-PWC, the training script of Sintel is two stages which is different from the script of flychairs, I wonder what is the training script for pwc-irr Sintel? 2.The training is two-stage in IRR-PWC Sintel training script. But I find that in the second stage, you did not use the checkpoint got from the first stage, but still use the original checkpoint, why? In my opinion, the second stage is finetuning based on the first stage's result, so the checkpoint in the second stage should inherit from the first stage. Thanks again!

minghuiwsw avatar May 01 '23 14:05 minghuiwsw

Hi, Jun @hurjunhwa I tried to train on Sintel with pwc-irr using the pwc-irr.sh script and I only change the items relative to dataset according to without any change to the training strategy. Finally for training I get the best_epe_avg of 5.5894 which I think is not good enough compared to the result in your paper. What's wrong with it? The training strategy or training&validation dataset? python ../main.py
--batch_size=$SIZE_OF_BATCH
--batch_size_val=$SIZE_OF_BATCH
--checkpoint=$CHECKPOINT
--lr_scheduler=MultiStepLR
--lr_scheduler_gamma=0.5
--lr_scheduler_milestones="[108, 144, 180]"
--model=$MODEL
--num_workers=4
--optimizer=Adam
--optimizer_lr=1e-4
--optimizer_weight_decay=4e-4
--save=$SAVE_PATH
--total_epochs=216
--training_augmentation=RandomAffineFlowOccSintel
--training_augmentation_crop="[384,768]"
--training_dataset=SintelTrainingCombFull
--training_dataset_photometric_augmentations=True
--training_dataset_root=$SINTEL_HOME
--training_key=total_loss
--training_loss=$EVAL_LOSS
--validation_dataset=SintelTrainingCombValid
--validation_dataset_photometric_augmentations=False
--validation_dataset_root=$SINTEL_HOME
--validation_key=epe
--validation_loss=$EVAL_LOSS

minghuiwsw avatar May 02 '23 02:05 minghuiwsw

Hi,

In the paper, we first train the model on the FlyingChairsOcc dataset from the scratch. This is a pretraining step.

Then we finetune the model on Sintel or KITTI. This finetuning step consists of two steps: (1) train the model on train & valid split to figure out the number of iteration steps for finetuning and (2) train the model using the all images for the number of iteration steps found at (1).

Did you first pretrain the model on FlyingChairsOcc? or did you train the model on Sintel from the scratch?

hurjunhwa avatar May 05 '23 20:05 hurjunhwa

yes, I trained on Sintel from the scratch at the betginning. And these days after I first trained on flychairs and finetuned on Sintel, it performs better. So I think it is the reason. But I have another two questions: 1.How did you get the checkpoint_best.ckpt in the foler /saved_check_point/PWCNet-irr? Currently I am using your PWCNet-irr rather than IRR-PWC, but I found there is only one ckpt. Is this ckpt the best one for flychairs or Sintel or all datasets? And how did you train the model to get it? 2.I understand your two-step training strategy for Sintel, but why didn't you train the model using the all images directly? Since the model will find the best ckpt and save it as long as you set the iteration steps large enough. I am a newcomer to this field and my oponion is possibly wrong. So I am very glad that you are so patient to reply to me so many times. Thanks a lot!

minghuiwsw avatar May 06 '23 06:05 minghuiwsw

Oh, actually the full training pipeline was FlyingChairs -> FlyingThings3D -> and then Sintel or KITTI finetuning.

I think the PWCNet-irr checkpoint is trained on FlyingChairs only. But probably you could doublecheck by running an inference on Sintel/KITTI and comparing the numbers in the paper

If training all images directly, it's hard to know when the model overfits because it always minimizes the loss. So the first stage is about finding a stopping point where the validation EPE is the lowest.

hurjunhwa avatar May 06 '23 14:05 hurjunhwa

Thanks for your timely reply! I got it. Sorry to bother you again but I have another two questions. 1.I saw one difference between pwcnet.py and pwcnet_irr.py is that pwcnet_irr.py added two rescale_flow operations. So I wonder why do we need to rescale the flow before and after the flow estimation module. 2.How did you test the model on test dataset like SintelTestClean? I did this according to the way to do validation which means I only changed dataset name in .sh file, but it did not work and it seems that there is no target flow to do comparison for the test dataset. Maybe I need to save the inference result and submit it to the Sintel official website? Can you show me the detailed process to do this? Waiting for your reply Best

minghuiwsw avatar May 06 '23 14:05 minghuiwsw

I did some experiments. When I delete the two rescale_flow operations, the training can still converge. But if I reduce the search range from 4 to 2 additionly, the training can not converge. Specifically, the training epe is nearly the same after 40 epoches( the progress right now, still on going). Is the rescale operation related to search range? The recalce operation I mentioned is this: flow = rescale_flow(flow, self._div_flow, width_im, height_im, to_local=True) Best

minghuiwsw avatar May 07 '23 02:05 minghuiwsw