A model trained on GPU cannot be loaded/used without GPU
Environment Details
Please indicate the following details about the environment in which you found the bug:
SDV version: 0.8.0 Python version: 3.6 Operating System: Debian
Error Description
I trained a CTGAN model using GPU/CUDA environment, saved the model, then tried to load and use it in a non-GPU environment. Loading fails with this error:
/usr/local/lib/python3.6/site-packages/torch/serialization.py in _cuda_deserialize(obj, location)
149 def _cuda_deserialize(obj, location):
150 if location.startswith('cuda'):
--> 151 device = validate_cuda_device(location)
152 if getattr(obj, "_torch_load_uninitialized", False):
153 storage_type = getattr(torch.cuda, type(obj).**name**)
/usr/local/lib/python3.6/site-packages/torch/serialization.py in validate_cuda_device(location)
133
134 if not torch.cuda.is_available():
--> 135 raise RuntimeError('Attempting to deserialize object on a CUDA '
136 'device but torch.cuda.is_available() is False. '
137 'If you are running on a CPU-only machine, '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
The model should not require GPU to do inference/data generation.
Steps to reproduce
See above. Sketch:
# run this in GPU-enabled env:
ctgan_model = CTGAN()
ctgan_model.fit(data)
ctgan_model.save('ctgan_model.pkl')
# run this in a non-GPU env
ctgan_model = CTGAN.load('ctgan_model.pkl')
Thanks for reporting this @surenius !
The PR #334 is changing the way the cuda device is set and may help in solving this problem, but there may be something else required.
What do you think @fealho? It is possible that we need to add a set_device sequence similar to what is done inside CTGAN load and save.
Adding the pending review label back to validate whether this continues to be an issue or not.
@csala For loading the model directly you can use a helper class like :
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location=torch.device('cpu'))
else: return super().find_class(module, name)
model = CPU_Unpickler(open( './model.pkl', "rb" )).load()
Problem is in sampling the model on a non GPU device. Possibility to use the set_device function (i see in the CTGANSynthesizer) would be a nice option. I don't know if this would work, didn't test it cause i'm not a python specialist. Or am i missing something obvious here?
workaround:
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location=torch.device('cpu'))
else: return super().find_class(module, name)
model = CPU_Unpickler(open( './model.pkl', "rb" )).load()
model._model.set_device(torch.device('cpu'))
samples = model.sample(10)
print(samples)
this line model._model.set_device(torch.device('cpu')) should instead be model._model.device = torch.device('cpu') @rguikers
ETA to fix this?
ETA to fix this?
have you tried my fix? it worked for me. i had to scowler the codebase to get the fix
I use model._model.set_device(torch.device('cpu')) before saving to change to CPU but for a release model in a Python package is not the best. It would be nice to have a clean method.
This issue has been solved by https://github.com/sdv-dev/CTGAN/pull/271 . The changes will apply with the following CTGAN release version: 0.7.1 Once that version is publicly available you will be able to fit a CTGAN model from SDV and then save and load from different platforms.