GANet Summary of differences between paper and code

Thank you very much for you interesting work and useful code!

When I was reading the code, I noticed there were several differences between the descriptions in the paper and implementations in the code. In the followings, I'm trying to summarize those differences.

Architecture:
- Paper: GANet-15 (15x 3D-Convs and 5x GA layers from GitHub readme)
- Code: GANet-Deep (22x 3D-Convs and 9x GA layers from GitHub readme)
- This is somewhat confusing to me. Network architecture in arxiv paper shows GANet-15 has 3 SGA and 2 LGA layers, so I guess "5 GA layers" above means 3 SGA + 2 LGA layers. This makes sense because GANet-Deep code has 7 SGA and 2 LGA layers, which matches with 9 GA layers. However, GANet-11 code uses 4 SGA and 2 LGA (totally 6 GA layers), so GANet-11 somehow uses more SGA layers than GANet-15... (?)
Loss y=f(|d-d'|)
- Paper: Smooth L1: y=0.5*x**2 (x<1), x-0.5 (x>=1)
- Code: y=x**2/a (x<a), 2x-(x-a)**2/(2b) - a (a<=x<a+b), x+b/2 (x>=a+b)
  - a=3 and b = 2.
Batch size & Crop size
- Paper: 16 & 240×576
- Code: 16 & 240x528 (stage 1) and 8 & 240x1248 (stage 2)
Finetuning strategy
- Paper: 640 epochs (lr=0.001 until 300 ep and then lr=0.0001 in the rest)
- Code: 800 (stage 1) + 8 (stage 2) epochs (lr=0.001 until 400 ep and then lr=0.0001 in the rest)

It would be very helpful if the authors could confirm my summary and provide more information if there are any additional differences.

Thank you very much.

Mar 14 '20 15:03 t-taniai

Hello, i have the same question as yours, and i find that the crop_size of pretrained model is 240*624. As far as i'm concerned, the crop_size somehow affect the accuracy. And my question is: since i want to use the pretrained model, is it better to maintain the crop_size of it while i'm finetuning.

Mar 17 '20 01:03 musuoliniao

Thanks for the summary. Some minor corrections:

Architecture:
- Paper: GANet-15 (15x 3D-Convs and 5x GA layers from GitHub readme)
- Code: GANet-Deep (22x 3D-Convs and 9x GA layers from GitHub readme)
- GANet-15 is similar to GANet-deep but with fewer layers (remove 4 SGA layers of the low resolution and some 3D conv layers). GANet-11 does not use the hourglass architecture.
Loss y=f(|d-d'|)
- Paper: Smooth L1: y=0.5*x**2 (x<1), x-0.5 (x>=1)
- Code: y=x**2/a (x<a), 2x-(x-a)**2/(2b) - a (a<=x<a+b), x+b/2 (x>=a+b)
  - a=3 and b = 2.
New loss (code use mixed l2 and threshold loss) is trying to get better threshold error rates in benchmark evaluations.
Batch size & Crop size
- Paper: 16 & 240×576
- Code: 16 & 240x528 (stage 1) and 8 & 240x1248 (stage 2)
- released model: 8 & 240x624 and 4 & 240x1248 using four GPUs (22G)
Finetuning strategy
- Paper: 640 epochs (lr=0.001 until 300 ep and then lr=0.0001 in the rest)
- Code: 800 (stage 1) + 8 (stage 2) epochs (lr=0.001 until 400 ep and then lr=0.0001 in the rest)
- Released model: 800 epochs.

Apr 15 '20 11:04 feihuzhang

Hi Feihu, Also another difference worth to mention: In 'GANet_deep.py' line 246 (class DispAgg), you finally used a F.normalize() instead of softmax(). This is also different from the paper using 'soft argmin' for the disparity regression. Could you please explain briefly? I found based on your code (with F.normalize()), sometimes the predicted disparity map could have negative disparity values due to that the probability of certain disparity candidates is negative after F.normalize(). Thanks a lot!

Jul 19 '21 09:07 GoodStudyDayUpUp