Questions about the training from scratch
Hi. I used the provided code to train TimeCycle on some other video datasets. Finetuning the network with the provided checkpoint_14.pth.tar works fine. But when I training the network from scratch, both the inlier loss and theta loss did not decrease. Is there any training tips when training TimeCycle from scratch?
@gonglixue - when you visualised your training results, did you ever get a blocky output for your visualization? We're running into similar problems, and I'm wondering if this is a problem for you too
same problem... :(
I have trained it from scratch successfully : )
Firstly set detach_network=True in model_simple.py, which means freezing the feature extractor. And then set detach_network=False to train the whole network end-to-end.
Thanks! i will try
@gonglixue - when you visualised your training results, did you ever get a blocky output for your visualization? We're running into similar problems, and I'm wondering if this is a problem for you too
I didn't come across the blocky output problem. Using the code in transformation.py to transform an image with a given affine matrix works correctly.
can you please provide me more details?
did you set also can_detach=True in forward_base method?
first you detach the encoder and train the transformation network for few epochs and then you set detach_network=False and train more epochs?
The optimizer and the optimizer settings are as stated in the paper?
my loss_targ_theta_skip is very noisy and the back_inliers is vanishing very early...
Thanks :)
can you please provide me more details? did you set also
can_detach=Trueinforward_basemethod? first you detach the encoder and train the transformation network for few epochs and then you setdetach_network=Falseand train more epochs? The optimizer and the optimizer settings are as stated in the paper?my
loss_targ_theta_skipis very noisy and theback_inliersis vanishing very early...Thanks :)
My full training process is as follow:
- Completely detach the feature extractor. That means
https://github.com/xiaolonw/TimeCycle/blob/16d33ac0fb0a08105a9ca781c7b1b36898e3b601/models/videos/model_simple.py#L166 is always
True
# detach_network=True in __init__()
# if self.detach_network and can_detach:
if self.detach_network:
x_pre = x_pre.detach()
In this step, I set lamda=0.3, lr=2e-4. And the inlier loss only decrease a little bit.
- After step-1 converges, set
detach_network=Falseto train the whole network and everything other is the same as original code.
if self.detach_netwrok and can_detach:
x_pre = x_pre.detach()
In this step, I find that the theta loss almost converges while the inlier loss decreases slowly. So decrease the weight of theta loss with lamda=0.1 and use a larger learning rate (lr=3e-4)
- Use a smaller learning rate (
lr=2e-4,lamda=0.1) to finetune.
My training process seems a little complicated. For some video data, I have to adjust the hyper parameters back and forth...
Thank you very much for the detailed answer, you are great!
Thanks so much for the help! Out of curiosity how many epochs did each of the steps take? i.e. How much training did you do before you unfroze the feature extractor?