torch dataloader crashed after `set_up_backend`

Open jk4011 opened this issue 3 years ago • 0 comments

Issue

Problem Description

I'm using this torch integration for deep learning model training. If I add this line set_up_backend("torch") or set_up_backend("torch", data_type="float32"), I get error at torch dataloader. I have no idea why this line make dataloader broken. Do you have any idea about this error?

Error message

  File "/home/wlsgur4011/torchlevy/test/data_loader_conflict.py", line 21, in test_data_loader_confliict
    for x in train_loader:
  File "/home/wlsgur4011/.anaconda3/envs/torchlevy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/wlsgur4011/.anaconda3/envs/torchlevy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 720, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/home/wlsgur4011/.anaconda3/envs/torchlevy/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 671, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/wlsgur4011/.anaconda3/envs/torchlevy/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 247, in __iter__
    for idx in self.sampler:
  File "/home/wlsgur4011/.anaconda3/envs/torchlevy/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 133, in __iter__
    yield from torch.randperm(n, generator=generator).tolist()[:self.num_samples % n]
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

Or can you give some explanation about set_up_backend("torch")? I cannot get any estimation how it works when I read the code. I found that logger.info("Active CUDA Device: GPU" + str(torch.cuda.current_device())) this line influence gpu setting. If I remove this line, it works on cpu but not on gpu. I think it is just log code, and cannot understand how this code influences the gpu setting.

Thank you for reading this poor bug report.

Expected Behavior

What Needs to be Done

How Can It Be Tested or Reproduced

Aug 31 '22 04:08 jk4011