Question about train log in validation test
I have tried to reproduce the result with batch size 8 but it is hard to get closer to your provided result on released chpt. Can I ask for your train log with this checkpoint for research purpose? I'm new so please tell me if i did something wrong. Thank you.
I do not have the training log now. The main problem is the batchsize. The model cannot train under this extreme small batchsize which will lead to strong overfitting. Please try under 32 bsize.
发件人: kaitoud906 发送时间: 2023年4月26日 16:29 收件人: SuperMHP/GUPNet 抄送: Subscribed 主题: [SuperMHP/GUPNet] Question about train log in validation test (Issue#28)
I have tried to reproduce the result with batch size 8 but it is hard to get closer to your provided result on released chpt. Can I ask for your train log with this checkpoint for research purpose? I'm new so please tell me if i did something wrong. Thank you. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
If your GPU cannot support 32 bsize, you can:
- Set the bsize as big as possible.
- A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lr*b/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.00125*16/32.
if i use 2 GUPs instead of 3, What should the batch size and lr be adjusted to fit ? I am trying with 2 V100.
@SuperMHP
If your GPU cannot support 32 bsize, you can:
- Set the bsize as big as possible.
- A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lrb/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.0012516/32.
if i use 2 GUPs instead of 3, What should the batch size and lr be adjusted to fit ? I am trying with 2 V100.
If your GPU cannot support 32 bsize, you can:
- Set the bsize as big as possible.
- A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lrb/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.0012516/32.
I have tried with batch size 32 and metrics (AP3D) always best at around epoch 120, which should be continue better till e140 as yours. When I visualize loss of your first 40e, I see initial 3D size loss is near 2, and decrease very fast. Mine is not the same, and 2D size loss is not stable from e30-e100 also. Have you ever encountered this case? My loss is below. Thank you.