nope-nerf Training not converging well, Dataset available

Here are my modifications to the source code

diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..4435fb2 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
   auto_scheduler: True
   eval_pose_every: -1
 extract_images:
-  resolution: [540, 960]
\ No newline at end of file
+  resolution: [3024, 4032]
+with_depth: False
diff --git a/configs/preprocess.yaml b/configs/preprocess.yaml
index c56b1fd..d3ec72c 100644
--- a/configs/preprocess.yaml
+++ b/configs/preprocess.yaml
@@ -1,9 +1,9 @@
 depth:
   type: DPT
 dataloading:
-  path: data/nerf_llff_data
-  scene: ['fern']
+  path: data/Test
+  scene: ['images']
   resize_factor: 
   load_colmap_poses: False
 training:
-  mode: 'all'
\ No newline at end of file
+  mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
         _, _, h, w = imgs.shape
 
         if customized_focal:
-            focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            FX_ = 13/35.0
+            CX_ = 4032
+            CY_ = 3024
+            FY_= FX_*(CY_/CX_)
:
+  scene: ['images']
   resize_factor: 
   load_colmap_poses: False
 training:
-  mode: 'all'
\ No newline at end of file
+  mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
         _, _, h, w = imgs.shape
 
         if customized_focal:
-            focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            FX_ = 13/35.0
+            CX_ = 4032
+            CY_ = 3024
+            FY_= FX_*(CY_/CX_)
+            focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
             if resize_factor is None:
                 resize_factor = 1
-            fx = focal_gt[0, 0] / resize_factor
-            fy = focal_gt[1, 1] / resize_factor
+            fx = focal_gt[0][0] / resize_factor
+            fy = focal_gt[1][1] / resize_factor
         else:
             if load_colmap_poses:
                 fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
   - conda-forge
   - anaconda
   - defaults
+  - nvidia
 dependencies:
-  - python=3.9
-  - pytorch=1.7
-  - torchvision=0.8.2 
-  - torchaudio 
-  - cudatoolkit=10.1
+  - python
+  - pytorch=2.0.0
+  - torchvision=0.15.0
+  - torchaudio=2.0.0
+  - pytorch-cuda=11.8
   - cffi
   - cython
   - imageio
@@ -39,4 +40,4 @@ dependencies:
     - lpips
     - setuptools
     - kornia==0.5.0
-    - imageio-ffmpeg
\ No newline at end of file
+    - imageio-ffmpeg
~
~
~
(END)
+  scene: ['images']
   resize_factor: 
   load_colmap_poses: False
 training:
-  mode: 'all'
\ No newline at end of file
+  mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
         _, _, h, w = imgs.shape
 
         if customized_focal:
-            focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            FX_ = 13/35.0
+            CX_ = 4032
+            CY_ = 3024
+            FY_= FX_*(CY_/CX_)
+            focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
             if resize_factor is None:
                 resize_factor = 1
-            fx = focal_gt[0, 0] / resize_factor
-            fy = focal_gt[1, 1] / resize_factor
+            fx = focal_gt[0][0] / resize_factor
+            fy = focal_gt[1][1] / resize_factor
         else:
             if load_colmap_poses:
                 fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
   - conda-forge
   - anaconda
   - defaults
+  - nvidia
 dependencies:
-  - python=3.9
-  - pytorch=1.7
-  - torchvision=0.8.2 
-  - torchaudio 
-  - cudatoolkit=10.1
+  - python
+  - pytorch=2.0.0
+  - torchvision=0.15.0
+  - torchaudio=2.0.0
+  - pytorch-cuda=11.8
   - cffi
   - cython
   - imageio
@@ -39,4 +40,4 @@ dependencies:
     - lpips
     - setuptools
     - kornia==0.5.0
-    - imageio-ffmpeg
\ No newline at end of file
+    - imageio-ffmpeg

And my dataset

https://drive.google.com/drive/folders/1ZZgZUrFrnP47rx8bN5K6yvYnSC50a-9G?usp=sharing

When what I have done to start training is put the images in

data/Test/images/images

then run the preprocess and train commands

and I have found the tensorboard attached here:

log.zip

Screenshot from 2023-09-03 13-07-27

Is this OK?

or did I muck up the intrinsics?

attached in a JPG to look at the EXIF information

6063

Screenshot from 2023-09-03 13-09-33

I think it may be 14 rather than 13 I will try again.

Sep 03 '23 03:09 samhodge-aiml

Made a change to 14 rather than 13 and if customized_focal: become if customized_focal or True:

Sep 03 '23 03:09 samhodge-aiml

Looks like this is a red hot tip: https://github.com/t-bence/exif-stats/blob/master/focal_stats.py#L44

Sep 03 '23 03:09 samhodge-aiml

Maybe all I needed was patience.

Screenshot from 2023-09-04 06-40-25

Sep 03 '23 21:09 samhodge-aiml

Does this seem correct?

@bianwenjing

I am worried that my modification with the width and height

diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
         _, _, h, w = imgs.shape
 
         if customized_focal:
-            focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            FX_ = 13/35.0
+            CX_ = 4032
+            CY_ = 3024
+            FY_= FX_*(CY_/CX_)
+            focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
             if resize_factor is None:
                 resize_factor = 1
-            fx = focal_gt[0, 0] / resize_factor
-            fy = focal_gt[1, 1] / resize_factor
+            fx = focal_gt[0][0] / resize_factor
+            fy = focal_gt[1][1] / resize_factor
         else:
             if load_colmap_poses:
                 fx, fy = focal, focal

Is in error, also I am wondering if CX_ should be 4032//2 and CY_ should be 3024//2

Also wondering if

diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..4435fb2 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
   auto_scheduler: True
   eval_pose_every: -1
 extract_images:
-  resolution: [540, 960]
\ No newline at end of file
+  resolution: [3024, 4032]

Messes up the internals of the convolutions, can I go for large scale resolution?

I am also wondering if I could speed up training the batch size of 1 seems to be under utilising resources, my 24 Gb RTX 3090 is only using a fraction of the VRAM and a fraction of the GPU Utilization.

grep -rn batch configs/
configs/default.yaml:14:  batchsize: 1
configs/default.yaml:78:  batch_size: 1

Screenshot from 2023-09-05 06-24-51

Sep 04 '23 20:09 samhodge-aiml

Yeah nah, didn't work trying again with different intrinsics values

diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..264c4cf 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
   auto_scheduler: True
   eval_pose_every: -1
 extract_images:
-  resolution: [540, 960]
\ No newline at end of file
+  resolution: [765, 1008]
+with_depth: False
diff --git a/configs/default.yaml b/configs/default.yaml
index adb9cb0..92aae7b 100644
--- a/configs/default.yaml
+++ b/configs/default.yaml
@@ -75,7 +75,7 @@ training:
   load_distortion_dir: model_distortion.pt
   n_training_points: 1024
   scheduling_epoch: 10000
-  batch_size: 1
+  batch_size: 8
   learning_rate: 0.001
   focal_lr: 0.001
   pose_lr: 0.0005
diff --git a/configs/preprocess.yaml b/configs/preprocess.yaml
index c56b1fd..d3ec72c 100644
--- a/configs/preprocess.yaml
+++ b/configs/preprocess.yaml
@@ -1,9 +1,9 @@
 depth:
   type: DPT
 dataloading:
-  path: data/nerf_llff_data
-  scene: ['fern']
+  path: data/Test
+  scene: ['images']
   resize_factor: 
   load_colmap_poses: False
 training:
-  mode: 'all'
\ No newline at end of file
+  mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..717ce8d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -81,12 +81,17 @@ class DataField(object):
         imgs = np.transpose(imgs, (0, 3, 1, 2))
         _, _, h, w = imgs.shape
 
-        if customized_focal:
-            focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+        if customized_focal or True:
+            #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+            FX_ = 14.0/35.0
+            CX_ = 35.0/2.0
+            CY_ = (2032/3024) * CX_
+            FY_= FX_*(CY_/CX_)
+            focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
             if resize_factor is None:
                 resize_factor = 1
-            fx = focal_gt[0, 0] / resize_factor
-            fy = focal_gt[1, 1] / resize_factor
+            fx = focal_gt[0][0] / resize_factor
+            fy = focal_gt[1][1] / resize_factor
         else:
             if load_colmap_poses:
                 fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
   - conda-forge
   - anaconda
   - defaults
+  - nvidia
 dependencies:
-  - python=3.9
-  - pytorch=1.7
-  - torchvision=0.8.2 
-  - torchaudio 
-  - cudatoolkit=10.1
+  - python
+  - pytorch=2.0.0
+  - torchvision=0.15.0
+  - torchaudio=2.0.0
+  - pytorch-cuda=11.8
   - cffi
   - cython
   - imageio
@@ -39,4 +40,4 @@ dependencies:
     - lpips
     - setuptools
     - kornia==0.5.0
-    - imageio-ffmpeg
\ No newline at end of file
+    - imageio-ffmpeg

Sep 05 '23 10:09 samhodge-aiml

This is with training from COLMAP and hallucinated depth maps

Screenshot from 2023-09-12 07-06-35

https://github.com/ActiveVisionLab/nope-nerf/assets/102564797/2b64c14b-40b8-4109-af7e-f1d6c770d4c3

What are the limitations on the input dataset?

Sep 11 '23 21:09 samhodge-aiml

Hi, sorry for my late reply. The input images I used are consecutive and closely sampled from a video. This is essential because the point cloud loss requires a dense matching between two views. I noticed that the images you provided are rather sparse, which might make the point cloud loss less effective.

Sep 24 '23 05:09 bianwenjing

Yeah they are from photos rather than a video. I can try shooting the same location from another photographic approach with much more dense sampling. Thanks for your response

Sep 24 '23 05:09 samhodge