Question regarding the learning rate
I am using 2 A100 gpus. For me samples_per_gpu=5 and workers_per_gpu=10. What according to you should be the the best learning rate for me as I am not being able to achieve the results mentioned in the paper?
I didn't try such a setting but I guess you should at least scale the learning rate and iterations linearly according to your batch size. In your case, you may try to scale it to 1/4 of the original one and enlarge the iterations to 4x longer.
@MendelXu thank you for the prompt reply. I'll make the learning rate 1e-6. Can you please tell me where in the code I need to change to enlarge the iterations?
The max_iters here. https://github.com/microsoft/SoftTeacher/blob/bef9a256e5c920723280146fc66b82629b3ee9d4/configs/soft_teacher/base.py#L264
Ok. So learning rate of 1e-6 and number of iterations=72000 is a good setting for me?
It might be (without promises).