Training not converging well, Dataset available
Here are my modifications to the source code
diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..4435fb2 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
auto_scheduler: True
eval_pose_every: -1
extract_images:
- resolution: [540, 960]
\ No newline at end of file
+ resolution: [3024, 4032]
+with_depth: False
diff --git a/configs/preprocess.yaml b/configs/preprocess.yaml
index c56b1fd..d3ec72c 100644
--- a/configs/preprocess.yaml
+++ b/configs/preprocess.yaml
@@ -1,9 +1,9 @@
depth:
type: DPT
dataloading:
- path: data/nerf_llff_data
- scene: ['fern']
+ path: data/Test
+ scene: ['images']
resize_factor:
load_colmap_poses: False
training:
- mode: 'all'
\ No newline at end of file
+ mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
_, _, h, w = imgs.shape
if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 13/35.0
+ CX_ = 4032
+ CY_ = 3024
+ FY_= FX_*(CY_/CX_)
:
+ scene: ['images']
resize_factor:
load_colmap_poses: False
training:
- mode: 'all'
\ No newline at end of file
+ mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
_, _, h, w = imgs.shape
if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 13/35.0
+ CX_ = 4032
+ CY_ = 3024
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
- conda-forge
- anaconda
- defaults
+ - nvidia
dependencies:
- - python=3.9
- - pytorch=1.7
- - torchvision=0.8.2
- - torchaudio
- - cudatoolkit=10.1
+ - python
+ - pytorch=2.0.0
+ - torchvision=0.15.0
+ - torchaudio=2.0.0
+ - pytorch-cuda=11.8
- cffi
- cython
- imageio
@@ -39,4 +40,4 @@ dependencies:
- lpips
- setuptools
- kornia==0.5.0
- - imageio-ffmpeg
\ No newline at end of file
+ - imageio-ffmpeg
~
~
~
(END)
+ scene: ['images']
resize_factor:
load_colmap_poses: False
training:
- mode: 'all'
\ No newline at end of file
+ mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
_, _, h, w = imgs.shape
if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 13/35.0
+ CX_ = 4032
+ CY_ = 3024
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
- conda-forge
- anaconda
- defaults
+ - nvidia
dependencies:
- - python=3.9
- - pytorch=1.7
- - torchvision=0.8.2
- - torchaudio
- - cudatoolkit=10.1
+ - python
+ - pytorch=2.0.0
+ - torchvision=0.15.0
+ - torchaudio=2.0.0
+ - pytorch-cuda=11.8
- cffi
- cython
- imageio
@@ -39,4 +40,4 @@ dependencies:
- lpips
- setuptools
- kornia==0.5.0
- - imageio-ffmpeg
\ No newline at end of file
+ - imageio-ffmpeg
And my dataset
https://drive.google.com/drive/folders/1ZZgZUrFrnP47rx8bN5K6yvYnSC50a-9G?usp=sharing
When what I have done to start training is put the images in
data/Test/images/images
then run the preprocess and train commands
and I have found the tensorboard attached here:
Is this OK?
or did I muck up the intrinsics?
attached in a JPG to look at the EXIF information
I think it may be 14 rather than 13 I will try again.
Made a change to 14 rather than 13 and if customized_focal: become if customized_focal or True:
Looks like this is a red hot tip: https://github.com/t-bence/exif-stats/blob/master/focal_stats.py#L44
Maybe all I needed was patience.
Does this seem correct?
@bianwenjing
I am worried that my modification with the width and height
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
_, _, h, w = imgs.shape
if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 13/35.0
+ CX_ = 4032
+ CY_ = 3024
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
Is in error, also I am wondering if CX_ should be 4032//2 and CY_ should be 3024//2
Also wondering if
diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..4435fb2 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
auto_scheduler: True
eval_pose_every: -1
extract_images:
- resolution: [540, 960]
\ No newline at end of file
+ resolution: [3024, 4032]
Messes up the internals of the convolutions, can I go for large scale resolution?
I am also wondering if I could speed up training the batch size of 1 seems to be under utilising resources, my 24 Gb RTX 3090 is only using a fraction of the VRAM and a fraction of the GPU Utilization.
grep -rn batch configs/
configs/default.yaml:14: batchsize: 1
configs/default.yaml:78: batch_size: 1
Yeah nah, didn't work trying again with different intrinsics values
diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..264c4cf 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
auto_scheduler: True
eval_pose_every: -1
extract_images:
- resolution: [540, 960]
\ No newline at end of file
+ resolution: [765, 1008]
+with_depth: False
diff --git a/configs/default.yaml b/configs/default.yaml
index adb9cb0..92aae7b 100644
--- a/configs/default.yaml
+++ b/configs/default.yaml
@@ -75,7 +75,7 @@ training:
load_distortion_dir: model_distortion.pt
n_training_points: 1024
scheduling_epoch: 10000
- batch_size: 1
+ batch_size: 8
learning_rate: 0.001
focal_lr: 0.001
pose_lr: 0.0005
diff --git a/configs/preprocess.yaml b/configs/preprocess.yaml
index c56b1fd..d3ec72c 100644
--- a/configs/preprocess.yaml
+++ b/configs/preprocess.yaml
@@ -1,9 +1,9 @@
depth:
type: DPT
dataloading:
- path: data/nerf_llff_data
- scene: ['fern']
+ path: data/Test
+ scene: ['images']
resize_factor:
load_colmap_poses: False
training:
- mode: 'all'
\ No newline at end of file
+ mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..717ce8d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -81,12 +81,17 @@ class DataField(object):
imgs = np.transpose(imgs, (0, 3, 1, 2))
_, _, h, w = imgs.shape
- if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ if customized_focal or True:
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 14.0/35.0
+ CX_ = 35.0/2.0
+ CY_ = (2032/3024) * CX_
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
- conda-forge
- anaconda
- defaults
+ - nvidia
dependencies:
- - python=3.9
- - pytorch=1.7
- - torchvision=0.8.2
- - torchaudio
- - cudatoolkit=10.1
+ - python
+ - pytorch=2.0.0
+ - torchvision=0.15.0
+ - torchaudio=2.0.0
+ - pytorch-cuda=11.8
- cffi
- cython
- imageio
@@ -39,4 +40,4 @@ dependencies:
- lpips
- setuptools
- kornia==0.5.0
- - imageio-ffmpeg
\ No newline at end of file
+ - imageio-ffmpeg
This is with training from COLMAP and hallucinated depth maps
https://github.com/ActiveVisionLab/nope-nerf/assets/102564797/2b64c14b-40b8-4109-af7e-f1d6c770d4c3
What are the limitations on the input dataset?
Hi, sorry for my late reply. The input images I used are consecutive and closely sampled from a video. This is essential because the point cloud loss requires a dense matching between two views. I noticed that the images you provided are rather sparse, which might make the point cloud loss less effective.
Yeah they are from photos rather than a video. I can try shooting the same location from another photographic approach with much more dense sampling. Thanks for your response