SuGaR icon indicating copy to clipboard operation
SuGaR copied to clipboard

Too Slow training

Open LaFeuilleMorte opened this issue 2 years ago • 8 comments

Hi, Thanks for your great work and the open source code. I encountered too slow training when doing training with my RTX 3090 machine. And it will take 5~6 minutes to do 50 iterations (8000 in total). And the whole training would take like over 10 hours. That's way longer than what it is in the paper. Am I miss doing something? image

LaFeuilleMorte avatar Dec 20 '23 10:12 LaFeuilleMorte

Hi LaFeuilleMorte,

Indeed, the training time seems very long, 50 iterations should be very short at the beginning of training (0.06 minutes) and get to 0.2 minutes max after starting the surface regularization.

I have several questions for you:

  1. Do you have the laptop or desktop version of the RTX 3090?
  2. How much memory does it have?
  3. How many Gaussians do you have in your initial Gaussian Splatting?

Anttwo avatar Dec 20 '23 10:12 Anttwo

我的屌丝配置计算机: My poor-configuration computer:

(2016,personal GPU workstation) GPU: Titan-XP 12G GPU-memory CPU: 12 cores MEM: 32G DISK: SSD

the training speed is acceptable,15000 iterations, about dozens of minutes.

yuedajiong avatar Dec 20 '23 23:12 yuedajiong

Hi, Thanks for the reply Do you have the laptop or desktop version of the RTX 3090? It's a desktop one.

How much memory does it have? 24GB

How many Gaussians do you have in your initial Gaussian Splatting? image

LaFeuilleMorte avatar Dec 21 '23 01:12 LaFeuilleMorte

我的屌丝配置计算机: My poor-configuration computer:

(2016,personal GPU workstation) GPU: Titan-XP 12G GPU-memory CPU: 12 cores MEM: 32G DISK: SSD

the training speed is acceptable,15000 iterations, about dozens of minutes.

To my best understanding of the code, it took too much time on the function "coarse_training_with_density_regularization".

LaFeuilleMorte avatar Dec 21 '23 02:12 LaFeuilleMorte

Happy holidays!

Same problem here.

I'm trying with a NVIDIA GeForce GTX 1650: image This is the information about my model: image

Thanks this incredible work!

Sbector avatar Dec 25 '23 05:12 Sbector

I also have this very same issue.

However, I'm only using a GeForce RTX 2060, and it only has 14 GB of VRAM, so that might be my issue (as opposed to an issue with the repository)

DanielChaseButterfield avatar Feb 21 '24 01:02 DanielChaseButterfield

Looking into this issue a little more, I want to ask: @LaFeuilleMorte, what is your GPU utilization versus GPU memory usage?

When running my model, it seems that almost the entirety of the memory is used, but the GPU itself is doing almost no work at all. I theorize that this could be because the CPU isn't getting information to the GPU fast enough, and so the bottleneck is the CPU.

image

Looking at the code, it seems that the model is only trained a single image at a time (i.e. the batch size is 1). I wonder if this is why the GPU has nothing to do. I tried changing the following parameter to a larger number of images, but it seems that at some point during development, this value was fixed to 1, as I get the following error if I try to change it. image

image

DanielChaseButterfield avatar Mar 02 '24 17:03 DanielChaseButterfield

it looks that the GS does not support batch.

yuedajiong avatar Mar 03 '24 05:03 yuedajiong