GUPNet icon indicating copy to clipboard operation
GUPNet copied to clipboard

Question about train log in validation test

Open kaitoud906 opened this issue 2 years ago • 5 comments

I have tried to reproduce the result with batch size 8 but it is hard to get closer to your provided result on released chpt. Can I ask for your train log with this checkpoint for research purpose? I'm new so please tell me if i did something wrong. Thank you.

kaitoud906 avatar Apr 26 '23 08:04 kaitoud906

I do not have the training log now. The main problem is the batchsize. The model cannot train under this extreme small batchsize which will lead to strong overfitting. Please try under 32 bsize.

发件人: kaitoud906 发送时间: 2023年4月26日 16:29 收件人: SuperMHP/GUPNet 抄送: Subscribed 主题: [SuperMHP/GUPNet] Question about train log in validation test (Issue#28)

I have tried to reproduce the result with batch size 8 but it is hard to get closer to your provided result on released chpt. Can I ask for your train log with this checkpoint for research purpose? I'm new so please tell me if i did something wrong. Thank you. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

SuperMHP avatar Apr 26 '23 09:04 SuperMHP

If your GPU cannot support 32 bsize, you can:

  1. Set the bsize as big as possible.
  2. A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lr*b/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.00125*16/32.

SuperMHP avatar Apr 26 '23 09:04 SuperMHP

if i use 2 GUPs instead of 3, What should the batch size and lr be adjusted to fit ? I am trying with 2 V100.

loipct avatar Apr 27 '23 09:04 loipct

@SuperMHP

If your GPU cannot support 32 bsize, you can:

  1. Set the bsize as big as possible.
  2. A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lrb/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.0012516/32.

if i use 2 GUPs instead of 3, What should the batch size and lr be adjusted to fit ? I am trying with 2 V100.

loipct avatar Apr 27 '23 10:04 loipct

If your GPU cannot support 32 bsize, you can:

  1. Set the bsize as big as possible.
  2. A trick may work but I cannot make sure. If your biggest bsize is b, you can re-scale the learning rate as lrb/32. For example, your biggest bsize is 16 and our initial lr is 0.00125. You can set the lr as 0.0012516/32.

I have tried with batch size 32 and metrics (AP3D) always best at around epoch 120, which should be continue better till e140 as yours. When I visualize loss of your first 40e, I see initial 3D size loss is near 2, and decrease very fast. Mine is not the same, and 2D size loss is not stable from e30-e100 also. Have you ever encountered this case? My loss is below. Thank you. stage1-140

kaitoud906 avatar Jun 08 '23 08:06 kaitoud906