PyVideoResearch icon indicating copy to clipboard operation
PyVideoResearch copied to clipboard

Add more baselines!

Open gsig opened this issue 6 years ago • 4 comments

We need your help!

gsig avatar Feb 04 '19 12:02 gsig

hi~ in your pre-trained models, the name of "aj_rgb_charades.pth", it means "aj_rgb_kinetics.pth" or a finetuning I3D model on Charades dataset

icyzhang0923 avatar Apr 25 '19 03:04 icyzhang0923

Ah, good catch. The readme was misleading, updated now.

aj_rgb_imagenet.pth is a model trained on imagenet+kinetics

aj_rgb_charades.pth is the aj_rgb_imagenet.pth model further trained on Charades.

If you need a model that starts from just an "inflated imagenet model" then you can use: '--arch', 'resnet50_3d', (or modify this to whatever resnet you want, see model/bases/resnet50_3d.py) and '--pretrained',

This will inflate the pytorch resnet imagenet model and use it as a starting point.

Best, Gunnar

gsig avatar Apr 25 '19 15:04 gsig

I have another problem, when I test the aj_rgb_charades.pth model, the mAP is 24.8%. If I train just one epoch(lr 0.1), the mAP is 32.9%. And I notice that the scripts of funetuning I3D-Inception contains AVA dataset and something something dataset. For Charades, I don't know the hyper parameters. So I use the same settings as piergiaj, the learning rate is 0.1, after 100 epoch ,the learning rate is 0.01, training about 150 epoch. I find that the mAP of first epoch is 0.09%, is it right? Could your share your scripts and log of funtuning I3D-Inception on Charades.

icyzhang0923 avatar Apr 26 '19 13:04 icyzhang0923

It's been while since I experimented with the i3d-inception models on Charades, since I mostly switched to ResNet50-I3D, except for trying to replicate the AVA baseline. One thing to keep in mind is that the AJ pretrained models assume that the images are normalized as 2img-1 instead of the usual rgb one, you should find that line commented out in the dataset script. the aj_ models are also not trained by me, so there may have been some secret sauce in training them, like multiple losses for the inception architecture.

I do have a script from an older version of this repo that obtained 'CharadesmAPvalvideo 0.3279434933766623' This is fine-tuned starting from aj_rgb_imagenet (which is trained on ImageNet+Kinetics), the experiment config was:

    '--name', 'i3d12',
    '--print-freq', '1',
    '--dataset', 'charades_video',
    '--arch', 'aj_i3d',
    '--lr', '.375',
    '--criterion', 'default_criterion',
    '--wrapper', 'default_wrapper',
    '--lr-decay-rate', '15,40',
    '--epochs', '50',
    '--batch-size', '5',
    '--video-batch-size', '10',
    '--train-size', '1.0',
    '--weight-decay', '0.0000001',
    '--window-smooth', '0',
    '--val-size', '0.2',
    '--cache-dir', '/nfs.yoda/gsigurds/caches/',
    '--data', '/scratch/gsigurds/Charades_v1_rgb/',
    '--train-file', '/home/gsigurds/Charades_v1_train.csv',
    '--val-file', '/home/gsigurds/Charades_v1_test.csv',
    '--pretrained',
    '--resume', '/nfs.yoda/gsigurds/charades_pretrained/aj_rgb_imagenet.pth',
    '--workers', '4',

and model_050.txt:

CharadesmAPvalvideo 0.3279434933766623
loss_train 0.06147414238721991
loss_val 0.10441828272431283
top1train 39.24310920490209
top1val 47.027027027027025
top5train 113.72161592658898
top5val 150.27027027027026
videotop1valvideo 63.92914653784219
videotop5valvideo 233.5480407944176

Log file: https://www.dropbox.com/s/7uc7e32tz56jrgy/i3d12.txt?dl=0 Looks like this was run with repo at this commit: https://github.com/gsig/PyVideoResearch/commit/12d3579

Let me know if that helps, and definitely submit a pull request if you figure out how to improve the i3d-inception baseline.

gsig avatar May 07 '19 01:05 gsig