Add more baselines!
We need your help!
hi~ in your pre-trained models, the name of "aj_rgb_charades.pth", it means "aj_rgb_kinetics.pth" or a finetuning I3D model on Charades dataset
Ah, good catch. The readme was misleading, updated now.
aj_rgb_imagenet.pth is a model trained on imagenet+kinetics
aj_rgb_charades.pth is the aj_rgb_imagenet.pth model further trained on Charades.
If you need a model that starts from just an "inflated imagenet model" then you can use: '--arch', 'resnet50_3d', (or modify this to whatever resnet you want, see model/bases/resnet50_3d.py) and '--pretrained',
This will inflate the pytorch resnet imagenet model and use it as a starting point.
Best, Gunnar
I have another problem, when I test the aj_rgb_charades.pth model, the mAP is 24.8%. If I train just one epoch(lr 0.1), the mAP is 32.9%. And I notice that the scripts of funetuning I3D-Inception contains AVA dataset and something something dataset. For Charades, I don't know the hyper parameters. So I use the same settings as piergiaj, the learning rate is 0.1, after 100 epoch ,the learning rate is 0.01, training about 150 epoch. I find that the mAP of first epoch is 0.09%, is it right? Could your share your scripts and log of funtuning I3D-Inception on Charades.
It's been while since I experimented with the i3d-inception models on Charades, since I mostly switched to ResNet50-I3D, except for trying to replicate the AVA baseline. One thing to keep in mind is that the AJ pretrained models assume that the images are normalized as 2img-1 instead of the usual rgb one, you should find that line commented out in the dataset script. the aj_ models are also not trained by me, so there may have been some secret sauce in training them, like multiple losses for the inception architecture.
I do have a script from an older version of this repo that obtained 'CharadesmAPvalvideo 0.3279434933766623' This is fine-tuned starting from aj_rgb_imagenet (which is trained on ImageNet+Kinetics), the experiment config was:
'--name', 'i3d12',
'--print-freq', '1',
'--dataset', 'charades_video',
'--arch', 'aj_i3d',
'--lr', '.375',
'--criterion', 'default_criterion',
'--wrapper', 'default_wrapper',
'--lr-decay-rate', '15,40',
'--epochs', '50',
'--batch-size', '5',
'--video-batch-size', '10',
'--train-size', '1.0',
'--weight-decay', '0.0000001',
'--window-smooth', '0',
'--val-size', '0.2',
'--cache-dir', '/nfs.yoda/gsigurds/caches/',
'--data', '/scratch/gsigurds/Charades_v1_rgb/',
'--train-file', '/home/gsigurds/Charades_v1_train.csv',
'--val-file', '/home/gsigurds/Charades_v1_test.csv',
'--pretrained',
'--resume', '/nfs.yoda/gsigurds/charades_pretrained/aj_rgb_imagenet.pth',
'--workers', '4',
and model_050.txt:
CharadesmAPvalvideo 0.3279434933766623
loss_train 0.06147414238721991
loss_val 0.10441828272431283
top1train 39.24310920490209
top1val 47.027027027027025
top5train 113.72161592658898
top5val 150.27027027027026
videotop1valvideo 63.92914653784219
videotop5valvideo 233.5480407944176
Log file: https://www.dropbox.com/s/7uc7e32tz56jrgy/i3d12.txt?dl=0 Looks like this was run with repo at this commit: https://github.com/gsig/PyVideoResearch/commit/12d3579
Let me know if that helps, and definitely submit a pull request if you figure out how to improve the i3d-inception baseline.