VectorizedMultiAgentSimulator icon indicating copy to clipboard operation
VectorizedMultiAgentSimulator copied to clipboard

Error when running on Apple Silicon with device = "mps"

Open KennyOrellana opened this issue 2 years ago • 6 comments

Hi, I'm running the example use_vmas_env; it runs well on Apple M1 Max using device = "cpu"; however, I'm getting an Error when changing device = "mps"

I installed PyTorch for Apple Silicon following the documentation https://developer.apple.com/metal/pytorch/

Could you help me to figure out how to fix this problem?

Here is the console log

/Users/kenny/.conda/envs/pythonProject/bin/python /Users/kenny/Projects/Pycharm/pythonProject/main.py 
Step 1
/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/core.py:1594: UserWarning: The operator 'aten::linalg_vector_norm' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:12.)
  torch.linalg.vector_norm(a.state.pos - b.state.pos, dim=1)
Traceback (most recent call last):
  File "/Users/kenny/Projects/Pycharm/pythonProject/main.py", line 93, in <module>
    use_vmas_env(render=True, save_render=False)
  File "/Users/kenny/Projects/Pycharm/pythonProject/main.py", line 73, in use_vmas_env
    env.render(
  File "/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/environment/environment.py", line 491, in render
    self.viewer.set_bounds(
  File "/Users/kenny/Projects/Pycharm/pythonProject/VectorizedMultiAgentSimulator/vmas/simulator/rendering.py", line 131, in set_bounds
    self.bounds = np.array([left, right, bottom, top])
  File "/Users/kenny/.conda/envs/pythonProject/lib/python3.10/site-packages/torch/_tensor.py", line 970, in __array__
    return self.numpy()
TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Process finished with exit code 1

KennyOrellana avatar Mar 19 '23 17:03 KennyOrellana

Hello,

mps is not yet fully supported as a torch device so we haven't tested vmas on it.

My suggestion is to use the cpu on mac untill they fix all the issues with mps (this is a torch thing rather than a vmas thing)

matteobettini avatar Mar 19 '23 17:03 matteobettini

Is it feasible to replace PyTorch with TensorFlow? Or TensorFlow has some limitations?

KennyOrellana avatar Mar 19 '23 19:03 KennyOrellana

The whole simulator is based on pytorch so I do not think it is feasible.

I’ll look into testing vmas with mps and fixing what I can fix and will let you know if there are any improvements

matteobettini avatar Mar 20 '23 08:03 matteobettini

Thanks, that would be really helpful 🙌🏽 .

KennyOrellana avatar Mar 20 '23 08:03 KennyOrellana

There are a lot of operators which are not yet supperted, like the 'norm' one. These are core to vmas and are everywhere.

UserWarning: The operator 'aten::linalg_vector_norm' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.) torch.linalg.vector_norm(a.state.pos - b.state.pos, dim=1)

I think for now we will just have to wait for torch to fix all these mps operators.

In the meantime, the mac M1/M2 cpus are really fast, you can use those.

matteobettini avatar Mar 20 '23 09:03 matteobettini

The simulator now seems to run fine with device="mps", the problem is just that many operations are extremely slow due to not being supported and falling back to cpu.

Like norm mentioned above

matteobettini avatar Jul 27 '23 08:07 matteobettini