pytorch-cutpaste about epoch

When run run_training.py,the epoch is 10000.Is it necessary?10000 is too large,and the MVTec is small.I worry about overfit.Can i set epoch smaller?for example,set epoch 4000? Thanks!

Jun 30 '22 07:06 huojinyong

Note that the --epoch parameter takes the number of update steps and not the number of iterations over the dataset. You can try to reduce the number of epochs and study its effect on the evaluation metrics.

Check out the following two graphs from the README, they show the evaluation metric over the update steps. CutPaste (without scar or 3-way) CutPaste(Scar)

During training, the model only sees the images without defects. Because we want the model to learn what the characteristics of a good sample, it might be desired to "overfit" to these images.

Jun 30 '22 20:06 Runinho

thanks for your explanation, so the results presented in README, they are obtained from the updated step 10000, for every class in MVTEC dataset?

Jul 08 '22 09:07 ghost

In README, you mentioned ' The --epoch parameter takes the number of update steps and not their definition of epochs.'. so taking class 'screw' for example, there are 320 normal images in the train set, and here we use batch size 32, which means it will iter 10 steps to run through all images. we set 10000 steps to get the results, does it mean in the conventional definition of epoch, we run 1000 epochs to get the results of class screw?

Jul 08 '22 17:07 ghost

we set 10000 steps to get the results, does it mean in the conventional definition of epoch, we run 1000 epochs to get the results of class screw?

Correct.

If you look into the paper on arxiv page 12 (Appendix 3) they specify how many update steps they use:

Number of training epochs ∈{128, 192, 256, 320, 384}.⁵

⁵ Note that, unlike conventional definition for an epoch, we define 256 parameter update steps as one epoch.

Which makes me think they use $256*256 = 65,536$ steps.

Jul 09 '22 00:07 Runinho

thanks for your prompt reply!

yes, I have noticed that in the paper, and I couldn't understand why they used this setup, because it means for smaller datasets they trained longer compare to conventional setup, taking class 'toothbrush' for example, there are only 60 images in the train set, they trained 65536 steps with batch size 32, in the conventional definition of epoch, it means they trained 32768 epochs for toothbrush, and 6553 epoch for screw. shouldn't we train longer for larger datasets?

Jul 10 '22 09:07 ghost