seq2rel Using Seq2rel function with cuda

Here's how I pass the device argument for function Seq2rel

from seq2rel import Seq2Rel
from seq2rel.common import util
model = 'model.tar.gz'
kwargs = {'cuda_device': 1}
seq2rel = Seq2Rel(model, **kwargs)

and got this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-15-d951d9bfd1c5>](https://localhost:8080/#) in <cell line: 2>()
      1 kwargs = {'cuda_device': 1}
----> 2 seq2rel = Seq2Rel(model, **kwargs)

12 frames
[/content/drive/MyDrive/seq2rel/seq2rel/seq2rel.py](https://localhost:8080/#) in __init__(self, pretrained_model_name_or_path, **kwargs)
     86         if "overrides" in kwargs:
     87             overrides.update(kwargs.pop("overrides"))
---> 88         archive = load_archive(pretrained_model_name_or_path, overrides=overrides, **kwargs)
     89         self._predictor = Predictor.from_archive(archive, predictor_name="seq2seq")
     90 

[/usr/local/lib/python3.9/dist-packages/allennlp/models/archival.py](https://localhost:8080/#) in load_archive(archive_file, cuda_device, overrides, weights_file)
    233             config.duplicate(), serialization_dir
    234         )
--> 235         model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
    236 
    237         # Load meta.

[/usr/local/lib/python3.9/dist-packages/allennlp/models/archival.py](https://localhost:8080/#) in _load_model(config, weights_path, serialization_dir, cuda_device)
    277 
    278 def _load_model(config, weights_path, serialization_dir, cuda_device):
--> 279     return Model.load(
    280         config,
    281         weights_file=weights_path,

[/usr/local/lib/python3.9/dist-packages/allennlp/models/model.py](https://localhost:8080/#) in load(cls, config, serialization_dir, weights_file, cuda_device)
    436             # get_model_class method, that recurses whenever it finds a from_archive model type.
    437             model_class = Model
--> 438         return model_class._load(config, serialization_dir, weights_file, cuda_device)
    439 
    440     def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:

[/usr/local/lib/python3.9/dist-packages/allennlp/models/model.py](https://localhost:8080/#) in _load(cls, config, serialization_dir, weights_file, cuda_device)
    341         # in sync with the weights
    342         if cuda_device >= 0:
--> 343             model.cuda(cuda_device)
    344         else:
    345             model.cpu()

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in cuda(self, device)
    686             Module: self
    687         """
--> 688         return self._apply(lambda t: t.cuda(device))
    689 
    690     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    576     def _apply(self, fn):
    577         for module in self.children():
--> 578             module._apply(fn)
    579 
    580         def compute_should_use_set_data(tensor, tensor_applied):

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    576     def _apply(self, fn):
    577         for module in self.children():
--> 578             module._apply(fn)
    579 
    580         def compute_should_use_set_data(tensor, tensor_applied):

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    576     def _apply(self, fn):
    577         for module in self.children():
--> 578             module._apply(fn)
    579 
    580         def compute_should_use_set_data(tensor, tensor_applied):

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    576     def _apply(self, fn):
    577         for module in self.children():
--> 578             module._apply(fn)
    579 
    580         def compute_should_use_set_data(tensor, tensor_applied):

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    576     def _apply(self, fn):
    577         for module in self.children():
--> 578             module._apply(fn)
    579 
    580         def compute_should_use_set_data(tensor, tensor_applied):

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _apply(self, fn)
    599             # `with torch.no_grad():`
    600             with torch.no_grad():
--> 601                 param_applied = fn(param)
    602             should_use_set_data = compute_should_use_set_data(param, param_applied)
    603             if should_use_set_data:

[/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in <lambda>(t)
    686             Module: self
    687         """
--> 688         return self._apply(lambda t: t.cuda(device))
    689 
    690     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

What is the correct way of using Seq2rel with cuda? Thank you!

Apr 18 '23 20:04 jjyhh1223

seq2rel is just pytorch, really, so this suggests a problem with your environment. Does 'cuda_device': 1 exist? (can you see it if you call it nvidia-smi?). Any chance you meant 'cuda_device': 0?

Apr 18 '23 20:04 JohnGiorgi

I tried doing the following,

from seq2rel import Seq2Rel
from seq2rel.common import util
model = 'model.tar.gz'
kwargs = {'cuda_device': 0}
seq2rel = Seq2Rel(model, **kwargs)

It worked, but I'm not sure if it's loaded on cuda because running examples seems very slowly (much slower than validating in training). Other than this, is there any ways to quickly running inference using Seq2rel?

May 08 '23 18:05 jjyhh1223

To see if the GPU is being utilized, you can run nvidia-smi on the same machine this code is running, and see if GPU utilization is >0%.

In general, the speed of inference will depend on quite a few arguments that you can control with the config or kwargs. max_length and beam_size have the largest effect, so I would try reducing either to see if you get a speed boost without harming performance. use_amp will really help if your GPU supports mixed-precision (and can actually slow things down if it doesn't). The larger the batch_size the better, for inference use the largest size that doesn't cause OOM errors.

The Seq2Rel interface is meant to just be a convenience thing. If you are trying to evaluate on a bunch of examples it might make more sense to use allennlp evaluate. You can see how to use that command here. The advice above on how to set the parameters for fastest inference is the same.

May 08 '23 19:05 JohnGiorgi