Ask-Anything icon indicating copy to clipboard operation
Ask-Anything copied to clipboard

Questions about VideoChat2_HD

Open LiJiaqi96 opened this issue 1 year ago • 35 comments

Hi, thanks for your update of VideoChat2_HD! When trying the newly-released code, I got some questions:

  • The MetaLoader_rs class in "train_it_ds.py" seems to be missing.
  • So I still used "train_it.py", but got the following error. I'm not sure whether it could be solved by using MetaLoader_rs.
RuntimeError: stack expects each tensor to be equal size, but got [8, 3, 224, 448] at entry 0 and [8, 3, 448, 672] at entry 1
  • Then I changed the batch_size to 1 and solved the previous error. But it seems the load_and_transform_media_data_image function does not have dynamic_config setting, which is passed to it in "it_dataset_mistral.py". I created a pull request to modify this part.
  • Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

LiJiaqi96 avatar Jun 12 '24 02:06 LiJiaqi96

Thanks for your try! I will fix it later~

Andy1621 avatar Jun 12 '24 03:06 Andy1621

@LiJiaqi96 Please have a try. have updated the code. The train_it_ds is add with deepspeed and need some change.

Andy1621 avatar Jun 12 '24 03:06 Andy1621

Thanks! I tried "train_it_ds.py" without using deepspeed, but it doesn't work. Is it possible to train without using deepspeed? Temporally I prefer not to use deepspeed.

LiJiaqi96 avatar Jun 12 '24 08:06 LiJiaqi96

Yes! You can run it without deepspeed. BTW, show me you log so that I can fix the bug ~

Andy1621 avatar Jun 12 '24 11:06 Andy1621

Sorry for the late reply. The log is here train_log.txt in "config_7b_hd_stage4.py", I set enable=False in deepspeed settings.
and run the code with:

torchrun    --nnodes=${NNODE} --nproc_per_node=${NUM_GPUS} \
    --rdzv_endpoint=${MASTER_NODE}:10068 \
    --rdzv_backend=c10d \
    tasks/train_it_ds.py \
    $(dirname $0)/config_7b_hd_stage4.py \
    output_dir ${OUTPUT_DIR}

LiJiaqi96 avatar Jun 13 '24 07:06 LiJiaqi96

I'm not sure whether it is cause by the deepspeed or pytorch verisons. Here are my versions of different packages:

torch                     1.13.1+cu117
torchaudio                0.13.1+cu117
torchnet                  0.0.4
torchvision               0.14.1+cu117
deepspeed                 0.14.2
transformers              4.40.1

BTW, sometimes you can fix the bug by change find_unused_parameters to True or Fasle.

Andy1621 avatar Jun 13 '24 08:06 Andy1621

Thanks, I will create an environment with exactly the same packages and have a try.

LiJiaqi96 avatar Jun 13 '24 10:06 LiJiaqi96

Hi, I found shared_utils_ds.py has a bug in line 58.

optimizer_params = create_optimizer(config.optimizer, model, return_group=True)

the optimizer.py may need to be updated.

yuanrr avatar Jun 13 '24 12:06 yuanrr

Thanks for your feedback. I have updated the code.

Andy1621 avatar Jun 13 '24 20:06 Andy1621

I used the new environment except flash-attn, as I used CUDA 12.1 and can only use flash-attn==2.1.0. I ran the code "scripts/videochat_mistral/run_7b_stage4_hd.sh", with "tasks/train_it.py" and deepspeed enable=False, then got error train_log0618.txt. The error seems to be caused by flash-attn.
Is it possible to run videochat2_hd using the same environment as videochat2_mistral, withou using deepspeed?

LiJiaqi96 avatar Jun 18 '24 04:06 LiJiaqi96

BTW I test to run the code on single GPU (like python train_it.py) and it iterates normally

LiJiaqi96 avatar Jun 18 '24 09:06 LiJiaqi96

Yes, it's okay to use it without deepspeed. I use deepspeed ZERO to decrease the GPU memory~

Andy1621 avatar Jun 18 '24 10:06 Andy1621

I see. Is it ok for you to run on multiple GPUs without deepspeed, just as the model runs in videochat2_mistral?

LiJiaqi96 avatar Jun 20 '24 01:06 LiJiaqi96

Update: I managed to solve the previous issue by upgrading the flash-attn to 2.5.9. When I use "train_it_ds.py" and with deepspeed enable=True, I met new issue about deepspeed config: trainlog_0621.txt
Could you please help me solve that?

LiJiaqi96 avatar Jun 21 '24 10:06 LiJiaqi96

Hi! Please try again with the newly commit.

Andy1621 avatar Jun 22 '24 18:06 Andy1621

Thanks for your update! Now the code could run with deepspeed enabled.
BTW, Is there any place to find the newly added dataset for VideoChat2_HD? I suppose the datasets are important to improve model performances.

LiJiaqi96 avatar Jun 24 '24 06:06 LiJiaqi96

Almost all the datasets can be directly downloaded from their repos or homepages~

Give me feedback if you don't find them.

Andy1621 avatar Jun 25 '24 11:06 Andy1621

new_IT_videos In "instruction_data.py", there are some newly added image datasets in M3IT, and some newly added videos datasets. Is there any place to find those video datasets? Thanks!

LiJiaqi96 avatar Jun 26 '24 06:06 LiJiaqi96

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Andy1621 avatar Jun 26 '24 07:06 Andy1621

Thanks for your sharing!

LiJiaqi96 avatar Jun 26 '24 09:06 LiJiaqi96

Another question, how could I obtain the checkpoint after VideoChat2_HD training? in "demo_mistral_hd.ipynb".
state_dict = torch.load("your_model_path/videochat2/videochat2_hd_mistral_stage4.pth", "cpu") I noticed that there are several files in the "ckpt_latest.pth" folder, should I choose one of them?
Thanks!

LiJiaqi96 avatar Jun 28 '24 02:06 LiJiaqi96

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, could you please help me find the instruction json files such as f"{anno_root_it}/video/caption/sharegptvideo/train_300k.json", I did not find the json files in the HF VideoChat2-IT repo.

LiJiaqi96 avatar Jun 28 '24 07:06 LiJiaqi96

Sorry for the late reply. For the checkpoint, you need to use the file named mp_xxx which saves weights. For the instruction data, I will upload it today.

Andy1621 avatar Jun 28 '24 23:06 Andy1621

@LiJiaqi96 Please check the data in HuggingFace~

Andy1621 avatar Jun 29 '24 04:06 Andy1621

Thanks for your reply! I will try it~

LiJiaqi96 avatar Jun 30 '24 10:06 LiJiaqi96

BTW, did you evaluate the effectiveness of the VideoChat2_HD and the newly added datasets, respectively? I'm curious about whether the training scheme or the dataset matters more for the improvement. Thanks!

LiJiaqi96 avatar Jul 01 '24 07:07 LiJiaqi96

We do not conduct serious comparisons since we want to make good use of pretrained models.

And I think both are important based on some experiments:

  • Stage4: Directly fine-tuning VideoChat2-Stage3 with HD on the original Stage3-dataset improved marginally.
  • Stage3: Fine-tuning VideoChat2-Stage2 with Stage4-dataset leads to performance drop by ~3%.

Andy1621 avatar Jul 01 '24 08:07 Andy1621

My experiment is consistent with your findings. I directly fine-tuning VideoChat2-Stage3 (trained by myself from Stage2, 3 epochs) with HD on the original Stage3-dataset (1 epoch), and the score on the MVBench drops from 56 to 43 ...

LiJiaqi96 avatar Jul 02 '24 01:07 LiJiaqi96

Interesting! I think HD needs more high-resolution and high-quality data.

Andy1621 avatar Jul 03 '24 01:07 Andy1621

These datasets are generated from ShareGPTVideo, VidLN, FAVD and TimeIT_didemo.

Hi, while downloading the datasets, I could not find the "infovqa". Could you please help me find the dataset?

LiJiaqi96 avatar Aug 14 '24 10:08 LiJiaqi96