Image-Adaptive-3DLUT Segmentation fault while training

It occured 'Segmentation fault (core dumped)' for cpu version and 'cudaCheckError() failed : invalid device function. Segmentation fault (core dumped)' for CUDA version every time when I trained this network. How could it be solved? Thanks in advance. error

Sep 02 '22 03:09 WEIIEW97

same problem

Sep 06 '22 16:09 ironheads

same problem

I fixed this problem by transferring the original CUDA code from c++ to python(numba.cuda). Maybe it avoids some complying bugs. (I tried to use pybind11 but it failed as well)

Sep 13 '22 09:09 WEIIEW97

@WEIIEW97 hey, sir. Actually i dive in the cuda c++ file. Then i found that the input image Tensor should be in size [CHANNEL_SIZE(3), BATCH_SIZE,WIDTH,HEIGHT]. However, in the python code , the input image is [BATCH_SIZE, CHANNEL_SIZE(3), WIDTH, HEIGHT]. So the rgb value is not correct. and another thing is that you should make the image pixel value between [0,1].

Actually, i was using this code because another work depends on it. and i use it to re-implement that work. So i didn't dive in the python code of this work. But after i change the compose procedure(compose different LUTs result) and the TrilinearInterpolationFunction of the python code. In addition, make sure the values of the image tensor is in [0,1]. It seems the code of that work (which depends on this code) can run normally.

I changed some code in c++ cuda, but i'm not sure whether it helps.

I'd like to know whether you fixed this bug in your python(numba.cuda) code. And it's better if you can share your numbda.cuda file.

Looking forward to your reply.

Thanks. trilinear_cpp.zip

Sep 13 '22 11:09 ironheads

@ironheads hello,

Thanks for your response and sharing! I did not dive deep as you did thus I'd like to appreciate your information. Attached below is my python(numba.cuda) code. I cannot guarantee robustness since it is just a basic migration.

trilinearcuda.zip

Sep 16 '22 03:09 WEIIEW97

@WEIIEW97 Thanks for your sharing. If It is a basic migration. I think there are some codes need to modify in the original training/evaluation python code. Some Modifications show as following.

# LUT0 = Generator3DLUT_identity()
# LUT1 = Generator3DLUT_zero()
# LUT2 = Generator3DLUT_zero()
#...
#...
# img = some images in  the dataset whose shape is [batch_size,3,width,height], make sure that the values are in range [0,1] (My segmentation fault comes from this)
# the following code comes from image_adaptive_lut_train_paired.py
pred = classifier(img).squeeze() # the img is still [batch_size,3,width,height]
# then you should modify the codes as following
new_img = img.permute(1,0,2,3).contiguous()
gen_A0 = LUT0(new_img)
gen_A1 = LUT1(new_img)
gen_A2 = LUT2(new_img)
combine_A = new_img.new(new_img.size())
    for b in range(new_img.size(1)):
        combine_A[:,b,:,:] = pred[b,0] * gen_A0[:,b,:,:] + pred[b,1] * gen_A1[:,b,:,:] + pred[b,2] * gen_A2[:,b,:,:] #+ pred[b,3] * gen_A3[:,b,:,:] + pred[b,4] * gen_A4[:,b,:,:]

result_A = combine_A.permute(1,0,2,3) #get the [batch_size,3,width,height] combined image
# the key is make the LUT's input in shape [3,batch_size,width,height], there maybe some other codes need to be modified when using LUT. I don't list them all because i do not use all the python codes in this work.

# another important modification is TrilinearInterpolationFunction, it should be modified as following.
class TrilinearInterpolationFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, lut, x):
        x = x.contiguous()
        output = x.new(x.size())
        dim = lut.size()[-1]
        shift = dim ** 3
        binsize = 1.000001 / (dim-1)
        W = x.size(2)
        H = x.size(3)
        batch = x.size(1) # this changes
        # print(batch)
        assert 1 == trilinear.forward(lut, 
                                      x, 
                                      output,
                                      dim, 
                                      shift, 
                                      binsize, 
                                      W, 
                                      H, 
                                      batch)

        int_package = torch.IntTensor([dim, shift, W, H, batch])
        float_package = torch.FloatTensor([binsize])
        variables = [lut, x, int_package, float_package]
        
        ctx.save_for_backward(*variables)
        
        return lut, output
    
    @staticmethod
    def backward(ctx, lut_grad, x_grad):
        
        lut, x, int_package, float_package = ctx.saved_variables
        dim, shift, W, H, batch = int_package
        dim, shift, W, H, batch = int(dim), int(shift), int(W), int(H), int(batch)
        binsize = float(float_package[0])
            
        assert 1 == trilinear.backward(x, 
                                       x_grad, 
                                       lut_grad,
                                       dim, 
                                       shift, 
                                       binsize, 
                                       W, 
                                       H, 
                                       batch)
        return lut_grad, x_grad

all the modifications aim to make the input of LUT in shape [3,batch_size,width,height] and then reshape result of LUT into [batch_size,3,width,height].

Another important thing is that the input values should be in range [0,1]

I don't know whether this will help If your code fails when batch_size > 1. but if you set batch_size > 1, you should modify the code. because when batch_size > 1, the original code is not a implementation of LUT.

Sep 16 '22 04:09 ironheads

@ironheads Thank you so much! I will do further research on it.

Sep 16 '22 09:09 WEIIEW97