[JBHI'24] MonoLoT

Official Pytorch implementation of MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy.

Qi HE, Guang Feng, Sophia Bano, Danail Stoyanov, Siyang Zuo

[IEEExplore][YouTube] [Github]

Overview

Our research has introduced an innovative approach that addresses the challenges associated with self-supervised monocular depth estimation in digestive endoscopy. We have addressed two critical aspects: the limitations of self-supervised depth estimation in low-texture scenes and the application of depth estimation in visual servoing for digestive endoscopy. Our investigation has revealed that the struggles of self-supervised depth estimation in low-texture scenes stem from inaccurate photometric reconstruction losses. To overcome this, we have introduced the point-matching loss, which refines the reprojected points. Furthermore, during the training process, data augmentation is achieved through batch image shuffle loss, significantly improving the accuracy and generalisation capability of the depth model. The combined contributions of the point matching loss and batch image shuffle loss have boosted the baseline accuracy by a minimum of 5% on both the C3VD and SimCol datasets, surpassing the generalisability of ground truth depth-supervised baselines when applied to upper-GI datasets. Moreover, the successful implementation of our robotic platform for automatic intervention in digestive endoscopy demonstrates the practical and impactful application of monocular depth estimation technology.

Performance

Installation

We tested our code on a server with Ubuntu 18.04.6, cuda 11.1, gcc 7.5.0

Clone project

$ git clone https://github.com/howardchina/MonoLoT.git
$ cd MonoLoT

Install environment

$ conda create --name monolot --file requirements.txt
$ conda activate monolot

Data

First, create a data/ folder inside the project path by

$ mkdir data

The data structure will be organised as follows:

$ tree data
data
├── c3vd_v2
│   ├── imgs -> <c3vd_v2_img_dir>
│   └── matcher_results
│       ├── test.npy
│       ├── train.npy
│       └── val.npy
└── simcol_complete
    ├── imgs -> <simcol_img_dir>
    └── matcher_results
        ├── test_352x352.npy
        ├── train_352x352.npy
        └── val_352x352.npy
...

Second, some image preprocessings are necessery such as undistort, filter static frames, and data split.

Take c3vd_v2 for instance, the following script should be processing:

undistort frames: playground\heqi\C3VD\data_preprocessing.ipynb
(optional) filter static frames and data split by yourself: playground\heqi\C3VD\gen_split.ipynb
(optional) generate matcher_results by yourself: playground\heqi\C3VD\gen_corres.ipynb

(optimal) Similar image preprocessings should be applied to simcol_complete as well, check playground\heqi\Simcol_complete.

We provide two ways for generating matching results saved in matcher_results folders.

(recommend) download matcher_results of c3vd_v2 and simcol_complete from here
(not recommend) generate the matcher_results by yourself using notebooks mentioned above such as playground\heqi\C3VD\gen_corres.ipynb. As this process will take about 2-4 hours.

Soft link (->) the well-prepared image folders to this workspace.

The image folder of c3vd_v2 (<c3vd_v2_img_dir>) will be organised as follows:

$ cd c3vd_v2
$ tree imgs
imgs
├── cecum_t1_a_under_review
│   ├── 0000_color.png
│   ├── 0000_depth.tiff
│   ├── 0001_color.png
│   ├── 0001_depth.tiff
│   ├── 0002_color.png
│   ├── 0002_depth.tiff
│   ├── 0003_color.png
│   ├── 0003_depth.tiff
│   ├── 0004_color.png
│   ├── 0004_depth.tiff
│   ├── ...
├── cecum_t1_b_under_review
│   ├── ...
├── cecum_t2_a_under_review
├── cecum_t2_b_under_review
├── cecum_t2_c_under_review
├── cecum_t3_a_under_review
├── cecum_t4_a_under_review
├── cecum_t4_b_under_review
├── desc_t4_a_under_review
├── sigmoid_t1_a_under_review
├── sigmoid_t2_a_under_review
├── sigmoid_t3_a_under_review
├── sigmoid_t3_b_under_review
├── trans_t1_a_under_review
├── trans_t1_b_under_review
├── trans_t2_a_under_review
├── trans_t2_b_under_review
├── trans_t2_c_under_review
├── trans_t3_a_under_review
├── trans_t3_b_under_review
├── trans_t4_a_under_review
└── trans_t4_b_under_review

The image folder of simcol_complete (<simcol_img_dir>) will be organised as follows:

$ cd ..
$ cd simcol_complete
$ tree imgs
imgs
├── SyntheticColon_I
│   ├── Test_labels
│   │   ├── Frames_S10
│   │   │   ├── Depth_0000.png
│   │   │   ├── Depth_0001.png
│   │   │   ├── Depth_0002.png
│   │   │   ├── Depth_0003.png
│   │   │   ├── Depth_0004.png
│   │   │   ├── Depth_0005.png
│   │   │   ...
│   │   │   ├── Depth_1200.png
│   │   │   ├── FrameBuffer_0000.png
│   │   │   ├── FrameBuffer_0001.png
│   │   │   ├── FrameBuffer_0002.png
│   │   │   ├── FrameBuffer_0003.png
│   │   │   ├── FrameBuffer_0004.png
│   │   │   ├── FrameBuffer_0005.png
│   │   │   ...
│   │   │   └── FrameBuffer_1200.png
│   │   ├── Frames_S15
│   │   └── Frames_S5
│   ├── Train
│   │   ├── Frames_S1
│   │   ├── Frames_S11
│   │   ├── Frames_S12
│   │   ├── Frames_S13
│   │   ├── Frames_S2
│   │   ├── Frames_S3
│   │   ├── Frames_S6
│   │   ├── Frames_S7
│   │   └── Frames_S8
│   └── Val
│       ├── Frames_S14
│       ├── Frames_S4
│       └── Frames_S9
├── SyntheticColon_II
│   ├── Test_labels
│   │   ├── Frames_B10
│   │   ├── Frames_B15
│   │   └── Frames_B5
│   ├── Train
│   │   ├── Frames_B1
│   │   ├── Frames_B11
│   │   ├── Frames_B12
│   │   ├── Frames_B13
│   │   ├── Frames_B2
│   │   ├── Frames_B3
│   │   ├── Frames_B6
│   │   ├── Frames_B7
│   │   └── Frames_B8
│   └── Val
│       ├── Frames_B14
│       ├── Frames_B4
│       └── Frames_B9
└── SyntheticColon_III
    ├── Test_labels
    │   ├── Frames_O1
    │   ├── Frames_O2
    │   └── Frames_O3
    └── Train

Public Data

For both training and inference:

The C3VD dataset is available in its Homepage.
The SimCol dataset is available in rdr.ucl.

For inference only:

The UpperGI dataset is available in Nutstore.
The EndoSLAM dataset is available in Mendeley.
The EndoMapper dataset is available in Synapse.

Training

To train scenes, we provide the following training scripts saved at experiments folder:

Table IV: train and test on C3VD (with training scripts saved at experiments\c3vd_v2\ folder)


Monodepth2 (baseline)	D	`supervised_c3vd_v2_monodepth2.yml`
Lite-mono (baseline)	D	`supervised_c3vd_v2_litemono.yml`
MonoViT (baseline)	D	`supervised_c3vd_v2_monovit.yml`
Monodepth2 (baseline)	M	`baseline_c3vd_v2_monodepth2.yml`
Lite-mono (baseline)	M	`baseline_c3vd_v2_litemono.yml`
MonoViT (baseline)	M	`baseline_c3vd_v2_monovit.yml`
Monodepth2 + $\mathcal{L}_{bis}^*$ + $\mathcal{L}_m$ (MonoLoT)	M	`RCC_matching_cropalign_c3vd_v2_monodepth2.yml`
Lite-mono + $\mathcal{L}_{bis}^+$ + $\mathcal{L}_m$ (MonoLoT)	M	`RC_matching_c3vd_v2_litemono.yml`
MonoViT + $\mathcal{L}_{bis}^+$ + $\mathcal{L}_m$ (MonoLoT)	M	`RC_matching_c3vd_v2_monovit.yml`

Table V: train and test on SimCol (with training scripts saved at experiments\simcol_complete\ folder)


Monodepth2 (baseline)	D	`supervised_simcol_complete_monodepth2.yml`
Lite-mono (baseline)	D	`supervised_simcol_complete_litemono.yml`
MonoViT (baseline)	D	`supervised_simcol_complete_monovit.yml`
Monodepth2 (baseline)	M	`baseline_simcol_complete_monodepth2.yml`
Lite-mono (baseline)	M	`baseline_simcol_complete_litemono.yml`
MonoViT (baseline)	M	`baseline_simcol_complete_monovit.yml`
Monodepth2 + $\mathcal{L}_{bis}^*$ + $\mathcal{L}_m$ (MonoLoT)	M	`RCC_cropalign_matching_simcol_complete_monodepth2.yml`
Lite-mono + $\mathcal{L}_{bis}^*$ + $\mathcal{L}_m$ (MonoLoT)	M	`RCC_cropalign_matching_simcol_complete_litemono.yml`
MonoViT + $\mathcal{L}_{bis}^+$ + $\mathcal{L}_m$ (MonoLoT)	M	`RC_matching_simcol_complete_monovit.yml`

Table VI: ablation study on C3VD (with training scripts saved at experiments\ablation_c3vd_v2\ folder)

	$\mathcal{L}_{bis}^+$	$\mathcal{L}_{bis}^*$	$\mathcal{L}_m$
Monodepth2	baseline			`baseline_c3vd_v2_monodepth2.yml`
	✓			`RC_baseline_c3vd_v2_monodepth2.yml`
		✓		`RCC_cropalign_c3vd_v2_monodepth2.yml`
			✓	`matching_c3vd_v2_monodepth2.yml`
	✓		✓	`RC_matching_c3vd_v2_monodepth2.yml`
		✓	✓	`RCC_matching_cropalign_c3vd_v2_monodepth2.yml`
Lite-mono	baseline			`baseline_c3vd_v2_litemono.yml`
	✓			`RC_baseline_c3vd_v2_litemono.yml`
		✓		`RCC_cropalign_c3vd_v2_litemono.yml`
			✓	`matching_c3vd_v2_litemono.yml`
	✓		✓	`RC_matching_c3vd_v2_litemono.yml`
		✓	✓	`RCC_cropalign_matching_c3vd_v2_litemono.yml`
MonoViT	baseline			`baseline_c3vd_v2_monovit.yml`
	✓			`RC_baseline_c3vd_v2_monovit.yml`
		✓		`RCC_cropalign_c3vd_v2_monovit.yml`
			✓	`matching_c3vd_v2_monovit.yml`
	✓		✓	`RC_matching_c3vd_v2_monovit.yml`
		✓	✓	`RCC_cropalign_matching_c3vd_v2_monovit.yml`

Table VII: ablation study on SimCol (with training scripts saved at experiments\ablation_simcol_complete\ folder)

	$\mathcal{L}_{bis}^+$	$\mathcal{L}_{bis}^*$	$\mathcal{L}_m$
Monodepth2	baseline			`baseline_simcol_complete_monodepth2.yml`
	✓			`RC_simcol_complete_monodepth2.yml`
		✓		`RCC_cropalign_simcol_complete_monodepth2.yml`
			✓	`matching_simcol_complete_monodepth2.yml`
	✓		✓	`RC_matching_simcol_complete_monodepth2.yml`
		✓	✓	`RCC_cropalign_matching_simcol_complete_monodepth2.yml`
Lite-mono	baseline			`baseline_simcol_complete_litemono.yml`
	✓			`RC_simcol_complete_litemono.yml`
		✓		`RCC_cropalign_simcol_complete_litemono.yml`
			✓	`matching_simcol_complete_litemono.yml`
	✓		✓	`RC_matching_simcol_complete_litemono.yml`
		✓	✓	`RCC_cropalign_matching_simcol_complete_litemono.yml`
MonoViT	baseline			`baseline_simcol_complete_monovit.yml`
	✓			`RC_baseline_simcol_complete_monovit.yml`
		✓		`RCC_cropalign_simcol_complete_monovit.yml`
			✓	`matching_simcol_complete_monovit.yml`
	✓		✓	`RC_matching_simcol_complete_monovit.yml`
		✓	✓	`RCC_cropalign_matching_simcol_complete_monovit.yml`

run them with

CUDA_VISIBLE_DEVICES=0 python run.py --config experiments/<file_name>.yml --cfg_params '{"model_name": "<exp_name>"}' --seed 1243 --gpu 0 --num_workers 4

The code will automatically run training.

Training will be recorded in results folder.
log file will be saved in results/<exp_name>/logs, which can be watched by tensorboard
checkpoints will be saved in results/<exp_name>/models

Inference

Evaluation:

Please refers to playground\heqi\eval_c3vd.ipynb and playground\heqi\eval_simcol_complete_align_with_paper.ipynb.
Existing checkpoints are avilable at Nutstore.
If you would like to further visualise these models, our codes for visulisation are also provide to download for reference at Nutstore.

Contact

Qi HE: [email protected]

Citation

If you find our work helpful, please consider citing:

@ARTICLE{10587075,
  author={He, Qi and Feng, Guang and Bano, Sophia and Stoyanov, Danail and Zuo, Siyang},
  journal={IEEE Journal of Biomedical and Health Informatics}, 
  title={MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy}, 
  year={2024},
  pages={1-14},
  keywords={Estimation;Endoscopes;Training;Data models;Robots;Feature extraction;Image reconstruction;Monocular depth estimation;automatic intervention;digestive endoscopy},
  doi={10.1109/JBHI.2024.3423791}}

LICENSE

This work is licensed under CC BY-NC-SA 4.0

Acknowledgement

We thank all authors from C3VD for presenting such an excellent work.
We thank all authors from SimCol3D for presenting such an excellent work.
We thank all authors from EndoSLAM for presenting such an excellent work.
We thank all authors from EndoMapper for presenting such an excellent work.

MonoLoT
MonoLoT copied to clipboard

Metadata

[JBHI'24] MonoLoT

Overview

Performance

Installation

Data

Public Data

Training

Inference

Contact

Citation

LICENSE

Acknowledgement

← Metadata

Owner

Metadata

MonoLoT MonoLoT copied to clipboard

Metadata

[JBHI'24] MonoLoT

Overview

Performance

Installation

Data

Public Data

Training

Inference

Contact

Citation

LICENSE

Acknowledgement

← Metadata

Owner

Metadata

MonoLoT
MonoLoT copied to clipboard