Using Seq2rel function with cuda
Here's how I pass the device argument for function Seq2rel
from seq2rel import Seq2Rel
from seq2rel.common import util
model = 'model.tar.gz'
kwargs = {'cuda_device': 1}
seq2rel = Seq2Rel(model, **kwargs)
and got this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-15-d951d9bfd1c5>](https://localhost:8080/#) in <cell line: 2>()
1 kwargs = {'cuda_device': 1}
----> 2 seq2rel = Seq2Rel(model, **kwargs)
12 frames
[/content/drive/MyDrive/seq2rel/seq2rel/seq2rel.py](https://localhost:8080/#) in __init__(self, pretrained_model_name_or_path, **kwargs)
86 if "overrides" in kwargs:
87 overrides.update(kwargs.pop("overrides"))
---> 88 archive = load_archive(pretrained_model_name_or_path, overrides=overrides, **kwargs)
89 self._predictor = Predictor.from_archive(archive, predictor_name="seq2seq")
90
[/usr/local/lib/python3.9/dist-packages/allennlp/models/archival.py](https://localhost:8080/#) in load_archive(archive_file, cuda_device, overrides, weights_file)
233 config.duplicate(), serialization_dir
234 )
--> 235 model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
236
237 # Load meta.
[/usr/local/lib/python3.9/dist-packages/allennlp/models/archival.py](https://localhost:8080/#) in _load_model(config, weights_path, serialization_dir, cuda_device)
277
278 def _load_model(config, weights_path, serialization_dir, cuda_device):
--> 279 return Model.load(
280 config,
281 weights_file=weights_path,
[/usr/local/lib/python3.9/dist-packages/allennlp/models/model.py](https://localhost:8080/#) in load(cls, config, serialization_dir, weights_file, cuda_device)
436 # get_model_class method, that recurses whenever it finds a from_archive model type.
437 model_class = Model
--> 438 return model_class._load(config, serialization_dir, weights_file, cuda_device)
439
440 def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:
[/usr/local/lib/python3.9/dist-packages/allennlp/models/model.py](https://localhost:8080/#) in _load(cls, config, serialization_dir, weights_file, cuda_device)
341 # in sync with the weights
342 if cuda_device >= 0:
--> 343 model.cuda(cuda_device)
344 else:
345 model.cpu()
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in cuda(self, device)
686 Module: self
687 """
--> 688 return self._apply(lambda t: t.cuda(device))
689
690 def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
576 def _apply(self, fn):
577 for module in self.children():
--> 578 module._apply(fn)
579
580 def compute_should_use_set_data(tensor, tensor_applied):
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
576 def _apply(self, fn):
577 for module in self.children():
--> 578 module._apply(fn)
579
580 def compute_should_use_set_data(tensor, tensor_applied):
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
576 def _apply(self, fn):
577 for module in self.children():
--> 578 module._apply(fn)
579
580 def compute_should_use_set_data(tensor, tensor_applied):
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
576 def _apply(self, fn):
577 for module in self.children():
--> 578 module._apply(fn)
579
580 def compute_should_use_set_data(tensor, tensor_applied):
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
576 def _apply(self, fn):
577 for module in self.children():
--> 578 module._apply(fn)
579
580 def compute_should_use_set_data(tensor, tensor_applied):
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
599 # `with torch.no_grad():`
600 with torch.no_grad():
--> 601 param_applied = fn(param)
602 should_use_set_data = compute_should_use_set_data(param, param_applied)
603 if should_use_set_data:
[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in <lambda>(t)
686 Module: self
687 """
--> 688 return self._apply(lambda t: t.cuda(device))
689
690 def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
What is the correct way of using Seq2rel with cuda? Thank you!
seq2rel is just pytorch, really, so this suggests a problem with your environment. Does 'cuda_device': 1 exist? (can you see it if you call it nvidia-smi?). Any chance you meant 'cuda_device': 0?
I tried doing the following,
from seq2rel import Seq2Rel
from seq2rel.common import util
model = 'model.tar.gz'
kwargs = {'cuda_device': 0}
seq2rel = Seq2Rel(model, **kwargs)
It worked, but I'm not sure if it's loaded on cuda because running examples seems very slowly (much slower than validating in training). Other than this, is there any ways to quickly running inference using Seq2rel?
To see if the GPU is being utilized, you can run nvidia-smi on the same machine this code is running, and see if GPU utilization is >0%.
In general, the speed of inference will depend on quite a few arguments that you can control with the config or kwargs. max_length and beam_size have the largest effect, so I would try reducing either to see if you get a speed boost without harming performance. use_amp will really help if your GPU supports mixed-precision (and can actually slow things down if it doesn't). The larger the batch_size the better, for inference use the largest size that doesn't cause OOM errors.
The Seq2Rel interface is meant to just be a convenience thing. If you are trying to evaluate on a bunch of examples it might make more sense to use allennlp evaluate. You can see how to use that command here. The advice above on how to set the parameters for fastest inference is the same.