CPU/CUDA results are very different

Open dkimanius opened this issue 4 years ago • 0 comments

Thanks for writting this up. I tried the code and got very good results with the CPU implementation. However, there's seems to be an issue with the vertix pruning in the CUDA implementation.

Running example from the README:

N = 128
x, y, z = np.mgrid[:N, :N, :N]
x = (x / N).astype('float32')
y = (y / N).astype('float32')
z = (z / N).astype('float32')
f0 = (x - 0.35) ** 2 + (y - 0.35) ** 2 + (z - 0.35) ** 2
f1 = (x - 0.65) ** 2 + (y - 0.65) ** 2 + (z - 0.65) ** 2
u = 1.0 / f0 + 1.0 / f1
u = torch.from_numpy(u).cuda()

dt = time.time()
verts, faces = marching_cubes(u.to("cpu"), 15.0)
print(f"CPU: verts {verts.shape[0]}, faces {faces.shape[0]} in {round(time.time() - dt, 2)} s")

dt = time.time()
verts, faces = marching_cubes(u.to("cuda:0"), 15.0)
print(f"CUDA: verts {verts.shape[0]}, faces {faces.shape[0]} in {round(time.time() - dt, 2)} s")

output

CPU: verts 47208, faces 94016 in 0.15 s
CUDA: verts 282048, faces 94016 in 0.0 s

Also see following example:

bz = 160
grid = np.zeros([bz, bz, bz])
idx = np.random.normal(loc=bz//2, scale=bz//16, size=[4000, 3])
idx = np.round(np.clip(idx, 0, bz-1)).astype(int)
for x, y, z in idx:
    grid[x, y, z] += 1.

grid_t = torch.Tensor(grid).to("cpu")
dt = time.time()
verts, faces = marching_cubes(grid_t, 1)
print(f"CPU: verts nr {verts.shape[0]}, faces {faces.shape[0]} in {round(time.time() - dt, 2)} s")

grid_t = torch.Tensor(grid).to("cuda:0")
dt = time.time()
verts, faces = marching_cubes(grid_t, 1)
print(f"CUDA: verts nr {verts.shape[0]}, faces {faces.shape[0]} in {round(time.time() - dt, 2)} s")

Output:

CPU: verts 4467, faces 35408 in 0.19 s
CUDA: verts 3, faces 1 in 0.0 s

Final output look like an overflow issue. Running this with bz=80 gives similar results as the first example.

Nov 21 '21 02:11 dkimanius