SDV icon indicating copy to clipboard operation
SDV copied to clipboard

A model trained on GPU cannot be loaded/used without GPU

Open surenius opened this issue 4 years ago • 5 comments

Environment Details

Please indicate the following details about the environment in which you found the bug:

SDV version: 0.8.0 Python version: 3.6 Operating System: Debian

Error Description

I trained a CTGAN model using GPU/CUDA environment, saved the model, then tried to load and use it in a non-GPU environment. Loading fails with this error:

/usr/local/lib/python3.6/site-packages/torch/serialization.py in _cuda_deserialize(obj, location)
    149 def _cuda_deserialize(obj, location):
    150     if location.startswith('cuda'):
--> 151         device = validate_cuda_device(location)
    152         if getattr(obj, "_torch_load_uninitialized", False):
    153             storage_type = getattr(torch.cuda, type(obj).**name**)

/usr/local/lib/python3.6/site-packages/torch/serialization.py in validate_cuda_device(location)
    133 
    134     if not torch.cuda.is_available():
--> 135         raise RuntimeError('Attempting to deserialize object on a CUDA '
    136                            'device but torch.cuda.is_available() is False. '
    137                            'If you are running on a CPU-only machine, '

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

The model should not require GPU to do inference/data generation.

Steps to reproduce

See above. Sketch:

# run this in GPU-enabled env:
ctgan_model = CTGAN()
ctgan_model.fit(data)
ctgan_model.save('ctgan_model.pkl')

# run this in a non-GPU env
ctgan_model = CTGAN.load('ctgan_model.pkl')

surenius avatar Mar 05 '21 19:03 surenius

Thanks for reporting this @surenius !

The PR #334 is changing the way the cuda device is set and may help in solving this problem, but there may be something else required.

What do you think @fealho? It is possible that we need to add a set_device sequence similar to what is done inside CTGAN load and save.

csala avatar Mar 11 '21 19:03 csala

Adding the pending review label back to validate whether this continues to be an issue or not.

csala avatar Aug 27 '21 11:08 csala

@csala For loading the model directly you can use a helper class like :

class CPU_Unpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == 'torch.storage' and name == '_load_from_bytes':
            return lambda b: torch.load(io.BytesIO(b), map_location=torch.device('cpu'))
        else: return super().find_class(module, name)

model = CPU_Unpickler(open( './model.pkl', "rb" )).load()

Problem is in sampling the model on a non GPU device. Possibility to use the set_device function (i see in the CTGANSynthesizer) would be a nice option. I don't know if this would work, didn't test it cause i'm not a python specialist. Or am i missing something obvious here?

rguikers avatar Oct 25 '21 11:10 rguikers

workaround:

class CPU_Unpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == 'torch.storage' and name == '_load_from_bytes':
            return lambda b: torch.load(io.BytesIO(b), map_location=torch.device('cpu'))
        else: return super().find_class(module, name)

model = CPU_Unpickler(open( './model.pkl', "rb" )).load()

model._model.set_device(torch.device('cpu'))
samples = model.sample(10)
print(samples)

rguikers avatar Oct 25 '21 15:10 rguikers

this line model._model.set_device(torch.device('cpu')) should instead be model._model.device = torch.device('cpu') @rguikers

loafthecomputerphile avatar Jul 29 '22 22:07 loafthecomputerphile

ETA to fix this?

davebulaval avatar Jan 22 '23 20:01 davebulaval

ETA to fix this?

have you tried my fix? it worked for me. i had to scowler the codebase to get the fix

loafthecomputerphile avatar Jan 23 '23 20:01 loafthecomputerphile

I use model._model.set_device(torch.device('cpu')) before saving to change to CPU but for a release model in a Python package is not the best. It would be nice to have a clean method.

davebulaval avatar Jan 23 '23 21:01 davebulaval

This issue has been solved by https://github.com/sdv-dev/CTGAN/pull/271 . The changes will apply with the following CTGAN release version: 0.7.1 Once that version is publicly available you will be able to fit a CTGAN model from SDV and then save and load from different platforms.

pvk-developer avatar Feb 23 '23 18:02 pvk-developer