How to load the pretrained safesensor and continue to train?
Hello, Thanks for your sharing code!
I am now try to train the stage 2 with the provided vista.safetensors
So I change the command to below:
torchrun \
--nnodes=1 \
--nproc_per_node=8 \
train.py \
--base configs/training/vista_phase2_stage2.yaml \
--finetune ${PATH_TO_STAGE1_CKPT}/vista.safetensors \
--num_nodes 1 \
--n_devices 8
But there are lots of missing keys like:
And the loss, in my expectation, should be low, which is not true in my observation:
I download the sampled video "samples_mp4_epoch00_batch0000_step000001.mp4":
https://github.com/OpenDriveLab/Vista/assets/62542727/80f5237f-9d68-46f5-8d5b-9ec0b5587b63
What should I do to use the provided weight to start the phase 2 stage 2 traning?
Sorry for the trouble. I haven't verify this resuming feature yet. It seems that there are some random weights after initialization. Make sure the new weights are initialized as zeros. In addition, if there are some "unexpected" weights when loading the checkpoint, make sure all of them are remapped to "missing" weights. It can be realized by renaming the keys in the state dictionary and loading the dictionary to the model again.
@JunyuanDeng Hi, have you resolved this issue? Could you please share how you did it? Thank you!
@Little-Podi Hi,I want to make sure your words mean that we need to change the code to set the missing keys initialized as zeros in this case? As when I set these missing keys's value to zero, the samples_mp4_epoch00_batch0000_step000001.mp4 is still in that strange form
@Little-Podi Hi, thanks a lot for sharing the great work! I met the same question, could you share the checkpoint after stage1 for continue training? Thanks a lot!
@Little-Podi Hi,I want to make sure your words mean that we need to change the code to set the missing keys initialized as zeros in this case? As when I set these missing keys's value to zero, the samples_mp4_epoch00_batch0000_step000001.mp4 is still in that strange form
@zhoujiawei3 Hello,Do you find the answer about it?
Upon inspection, vista.safetensors does not appear to contain model_ema weights. I manually enable init_ema in this line and it seems to work.
@hungdche, Hi, thanks for your response! Do you mean that we always set "model.reinit_ema()"?
Hello, When I finetune the LoRA layers using the pretrained weights, I meet such error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Do you know how to solve it? Sorry to bother you, Thank you very much!