ZoeDepth About the datasets for metric depth estimation

Thanks for your great work!

I am curious that why you use 10 datasets for relative depth training and only 2 datasets for metric depth training? Have you tried using more datasets for metric depth training? I wonder if you can use all 12 datasets for metric depth estimation. Looking forward to your responce!

Jun 18 '23 15:06 weiyithu

relative depth training is done by midas team, he just uses the pretrained weight. Those datasets don't have metric gt.

Jul 13 '23 05:07 kwea123

In report,MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation,5.Applications says that ZoeD-M12- NK, incorporates a MiDaS v3.1 architecture with the BEiTL encoder with a newly-proposed metric depth binning module that is appended to the decoder. Training combines relative depth training for the MiDaS architecture on the 5+12 dataset mix as described in Sec. 3.3, followed by metric depth fine-tuning for the prediction heads in the bins module. Extensive results verify that ZoeDepth models benefit from relative depth training via MiDaS v3.1, enabling finetuning on two metric depth datasets at once (NYU Depth v2 and KITTI) as well as achieving unprecedented zero-shot generalization performance to adiverse set of unseen metric depth datasets.

Aug 02 '23 10:08 northagain

@shariqfarooq123 @northagain @kwea123 Hello, I am a beginner in this field. I would like to ask how ZoeDepth can produce metric depth when using NYUv2, given that its ground truth is in PNG format. It seems that there should not be any metric or meter significance in it. Why do the authors claim that it can produce absolute depth? The Midas paper also states that NYUv2 contains absolute depth, but after processing, I only have PNG files for NYUv2, unlike KITTI, which has NPY files containing depth for each pixel. Where am I misunderstanding this? Thank you in advance for your assistance.

Sep 12 '24 13:09 uowei

Hello, 不知道我描述的你能不能理解（英文不咋滴）: PNG images can be set in many modes, uint8 contains 0-255 uint16 contains 0-65535 in some depth estimation dataloader, u can see that, during data pre-processing the nyu png image is loaded and /1000 to get metric depth value. @uowei

Sep 13 '24 02:09 northagain

@northagain Hello, 首先感謝你的回覆，能用中文溝通是再好不過了!

而我目前的理解是單眼深度估計出來的結果應該都是相對深度，我其實不太明白作者使用了這種bin的方式為何就可以產生出絕對深度了，像是adabins, binsformer等不是也都是採用這種分類回歸的方式嗎? 那他們得到的也是絕對深度嗎?

在midas那篇論文中也有提到有些資料集是不包含絕對深度的，但他們不是也有深度圖嗎?如果按照你說的方式不是也可以當成是有絕對深度嗎?

不好意思，我對這個領域還沒有很熟悉，如果我的理解有誤還煩請指正了! 再次謝謝你!

Sep 13 '24 06:09 uowei

@uowei "單眼深度估計出來的結果應該都是相對深度"这句话的描述_对但没完全对_，在这些采用多数据集共同训练的模型出来之前，单目去估计绝对深度的时候多数采用的是估计出经过sigmoid后的(0,1)的相对深度，再和maxdepth（不同数据集不一样）相乘，所以之前有视为绝对深度=相对深度*maxdepth的工作（但其实这么描述不太准确，而且泛化性能明显不够，因为不同数据集有不同的先验条件）。

比如bins的方法，adabins原文里提到D = (dmin, dmax) into N bins.This interval is fixed for a given dataset and is determined by dataset specification or manually set to a reasonable range 也是提前设定好的，比如只是把100m切成很多份，但这个100m是我在训练和测试之前都设定好的，所以得到的是绝对深度。

midas应该里面有提到说根据尺度不变性，绝对深度 = scale*相对深度 +shift ，这里的scale似乎可以理解为上面的max_depth。

个人理解，不知道我的描述是否正确，仅供参考，还有什么问题可以邮箱联系我，共同进步！ [email protected]。

Sep 14 '24 01:09 northagain

@northagain 我在issue10也有看到跟你一樣的說法 https://github.com/isl-org/ZoeDepth/issues/10

不過我好奇的是，這樣是否說明相對深度估計的條件就只取決於dataset了嗎? 還是這邊作者使用的scale-invariant loss也是其中一個關鍵要素呢?

我有透過郵箱聯繫您，不知道您是否有看到呢? 我的郵箱是[email protected] 再次感謝您的解惑!

Sep 19 '24 09:09 uowei