TSM with lower batch size decreases performance
Hi, I was running the test_models on somethingv2 but had to decrease the batch_size from 72 to 12 since I was getting a CUDA issue.
The accuracy is lower as expected, but I just wanted to double check that the results are correct. For TSM ResNet101 | 8 * 2clip I am getting 43.5 compared to your reported 63.3 where you used batch size 72.
python test_models.py somethingv2 --weights=pretrained/TSM_something_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth --test_segments=8 --batch_size=12 -j 24 --test_crops=3 --twice_sample
somethingv2: 174 classes
=> shift: True, shift_div: 8, shift_place: blockres
Initializing TSN with base model: resnet101.
TSN Configurations:
input_modality: RGB
num_segments: 8
new_length: 1
consensus_module: avg
dropout_ratio: 0.8
img_feature_dim: 256
=> base model: resnet101
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/liangyi/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 170M/170M [00:02<00:00, 60.8MB/s]
Adding temporal shift...
=> n_segment per stage: [8, 8, 8, 8]
=> Using n_round 2 to insert temporal shift
=> Processing stage with 3 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 4 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 23 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Using fold div: 8
=> Processing stage with 3 blocks residual
=> Using fold div: 8
=> Using fold div: 8
=> Using twice sample for the dataset...
video number:8629
video 0 done, total 0/8629, average 2.820 sec/video, moving Prec@1 58.333 Prec@5 91.667
video 240 done, total 240/8629, average 0.264 sec/video, moving Prec@1 47.222 Prec@5 75.794
video 480 done, total 480/8629, average 0.181 sec/video, moving Prec@1 47.154 Prec@5 76.423
video 720 done, total 720/8629, average 0.152 sec/video, moving Prec@1 48.634 Prec@5 76.366
video 960 done, total 960/8629, average 0.138 sec/video, moving Prec@1 47.634 Prec@5 75.926
video 1200 done, total 1200/8629, average 0.129 sec/video, moving Prec@1 48.762 Prec@5 76.650
video 1440 done, total 1440/8629, average 0.123 sec/video, moving Prec@1 47.796 Prec@5 76.102
video 1680 done, total 1680/8629, average 0.119 sec/video, moving Prec@1 47.754 Prec@5 76.241
video 1920 done, total 1920/8629, average 0.116 sec/video, moving Prec@1 47.826 Prec@5 76.915
video 2160 done, total 2160/8629, average 0.114 sec/video, moving Prec@1 47.744 Prec@5 77.026
video 2400 done, total 2400/8629, average 0.112 sec/video, moving Prec@1 47.554 Prec@5 76.949
video 2640 done, total 2640/8629, average 0.110 sec/video, moving Prec@1 47.813 Prec@5 77.036
video 2880 done, total 2880/8629, average 0.109 sec/video, moving Prec@1 47.061 Prec@5 76.591
video 3120 done, total 3120/8629, average 0.108 sec/video, moving Prec@1 47.254 Prec@5 76.628
video 3360 done, total 3360/8629, average 0.107 sec/video, moving Prec@1 47.361 Prec@5 76.839
video 3600 done, total 3600/8629, average 0.106 sec/video, moving Prec@1 47.231 Prec@5 76.993
video 3840 done, total 3840/8629, average 0.105 sec/video, moving Prec@1 47.456 Prec@5 77.129
video 4080 done, total 4080/8629, average 0.105 sec/video, moving Prec@1 47.923 Prec@5 77.224
video 4320 done, total 4320/8629, average 0.104 sec/video, moving Prec@1 48.361 Prec@5 77.378
video 4560 done, total 4560/8629, average 0.104 sec/video, moving Prec@1 48.513 Prec@5 77.472
video 4800 done, total 4800/8629, average 0.103 sec/video, moving Prec@1 48.545 Prec@5 77.369
video 5040 done, total 5040/8629, average 0.103 sec/video, moving Prec@1 48.614 Prec@5 77.613
video 5280 done, total 5280/8629, average 0.103 sec/video, moving Prec@1 48.639 Prec@5 77.702
video 5520 done, total 5520/8629, average 0.102 sec/video, moving Prec@1 48.789 Prec@5 77.784
video 5760 done, total 5760/8629, average 0.102 sec/video, moving Prec@1 48.805 Prec@5 77.859
video 6000 done, total 6000/8629, average 0.102 sec/video, moving Prec@1 48.619 Prec@5 77.761
video 6240 done, total 6240/8629, average 0.101 sec/video, moving Prec@1 48.448 Prec@5 77.703
video 6480 done, total 6480/8629, average 0.101 sec/video, moving Prec@1 48.413 Prec@5 77.526
video 6720 done, total 6720/8629, average 0.101 sec/video, moving Prec@1 48.351 Prec@5 77.436
video 6960 done, total 6960/8629, average 0.101 sec/video, moving Prec@1 48.322 Prec@5 77.567
video 7200 done, total 7200/8629, average 0.101 sec/video, moving Prec@1 48.281 Prec@5 77.718
video 7440 done, total 7440/8629, average 0.100 sec/video, moving Prec@1 48.269 Prec@5 77.684
video 7680 done, total 7680/8629, average 0.100 sec/video, moving Prec@1 48.193 Prec@5 77.613
video 7920 done, total 7920/8629, average 0.100 sec/video, moving Prec@1 48.122 Prec@5 77.509
video 8160 done, total 8160/8629, average 0.100 sec/video, moving Prec@1 48.079 Prec@5 77.631
video 8400 done, total 8400/8629, average 0.100 sec/video, moving Prec@1 47.967 Prec@5 77.568
[0.81395349 0.42592593 0.38333333 0.48780488 0.16666667 0.40909091
0.73611111 0.5 0.45454545 0.38181818 0.42465753 0.40425532
0.19480519 0.5 0.63265306 0.26436782 0.325 0.28813559
0.17391304 0.30379747 0.25806452 0.54761905 0.3015873 0.37142857
0.54545455 0.5 0.5 0.5 0.69444444 0.37837838
0.8 0.76315789 0.78125 0.14285714 0.04166667 0.13636364
0.63291139 0.61111111 0.125 0.57142857 0.64285714 0.51724138
0.5375 0.40677966 0.6 0.44537815 0.40163934 0.43243243
0.46296296 0.56953642 0.63953488 0. 0.5 0.53333333
0.33333333 0.43478261 0.28301887 0.51219512 0. 0.65384615
0.61538462 0.26666667 0.78571429 0.05714286 0.2 0.42857143
0.08108108 0.28571429 0.38666667 0.3 0.5 0.
0.84210526 0.42857143 0.51282051 0.14285714 0.4 0.6
0.28571429 0.82608696 0.05555556 0.13636364 0.42857143 0.23529412
0.46666667 0.5 0.51470588 0.6122449 0.33333333 0.09375
0.26086957 0.81818182 0.35294118 0.6091954 0.81707317 0.31818182
0.33333333 0.19148936 0.30645161 0.33823529 0.39393939 0.61428571
0.3 0.45945946 0.58139535 0.63636364 0.48076923 0.5942029
0.30769231 0.4673913 0.38461538 0. 0.45714286 0.4
0.58878505 0.16666667 0.23076923 0.58974359 0.19047619 0.54285714
0.86363636 0.68 0.37179487 0.33333333 0.35714286 0.425
0.26470588 0.28205128 0.43434343 0.5 0.26190476 0.125
0.12903226 0.26086957 0.79661017 0.37735849 0.14285714 0.25
0.18421053 0.525 0.57647059 0.2 0.53846154 0.49484536
0.69230769 0.2892562 0.56349206 0.13513514 0.51136364 0.86597938
0.67391304 0.20634921 0.48717949 0.58974359 0.46938776 0.2
0.27777778 0.39705882 0.30769231 0.42105263 0.38235294 0.31818182
0.36956522 0.30434783 0.68217054 0.95238095 0.93023256 0.96078431
0.85714286 0.60714286 0.49230769 0.66666667 0.62025316 0.71153846]
upper bound: 0.45401718997740914
-----Evaluation is finished------
Class Accuracy 43.54%
Overall Prec@1 48.02% Prec@5 77.51%
Could you just confirm this?
My batch size is 8, I used the pre-trainning model provided by the authors to achieve the same accuracy as the authors.But I've had this problem with you before.It is likely that you did not do a good job of preprocessing dataset.I suggest you take closer look at your dataset.The dataset should be downloaded under Ubuntu, rather than windows!