ml-agents icon indicating copy to clipboard operation
ml-agents copied to clipboard

I had an issue when I trained Huggy, and solved it locally (HuggingFace Deep Reinforcement Learning Course)

Open tomervazana opened this issue 1 year ago • 1 comments

When I tried to run the command !mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy2" --no-graphics I got an error:

Traceback (most recent call last):
  File "/usr/local/bin/mlagents-learn", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "/usr/local/bin/mlagents-learn", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/content/ml-agents/ml-agents/mlagents/trainers/learn.py", line 2, in <module>
    from mlagents import torch_utils
  File "/content/ml-agents/ml-agents/mlagents/torch_utils/__init__.py", line 1, in <module>
    from mlagents.torch_utils.torch import torch as torch  # noqa
  File "/content/ml-agents/ml-agents/mlagents/torch_utils/torch.py", line 63, in <module>
    set_torch_config(TorchSettings(device=None))
  File "/content/ml-agents/ml-agents/mlagents/torch_utils/torch.py", line 56, in set_torch_config
    torch.set_default_dtype(torch.cuda.FloatTensor)
  File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1009, in set_default_dtype
    _C._set_default_dtype(d)
TypeError: invalid dtype object: only floating-point types are supported as the default type

I solved it by changing the method def set_torch_config(torch_settings: TorchSettings) -> None: of the file /content/ml-agents/ml-agents/mlagents/torch_utils/torch.py (on Colab) to:

def set_torch_config(torch_settings: TorchSettings) -> None:
    global _device

    if torch_settings.device is None:
        device_str = "cuda" if torch.cuda.is_available() else "cpu"
    else:
        device_str = torch_settings.device

    _device = torch.device(device_str)

    if _device.type == "cuda":
        torch.set_default_device(_device.type)
        # Set default tensor type to CUDA tensors
        torch.set_default_tensor_type(torch.cuda.FloatTensor)
    else:
        # Set default tensor type to CPU tensors
        torch.set_default_tensor_type(torch.FloatTensor)
    # Set default dtype to float32 for consistency
    torch.set_default_dtype(torch.float32)

    # Add a print statement to confirm execution
    print(f"set_torch_config called. Default device: {_device}, Default dtype: {torch.get_default_dtype()}")

    logger.debug(f"default Torch device: {_device}")

It solved the problem and printed set_torch_config called. Default device: cuda, Default dtype: torch.float32

tomervazana avatar Sep 22 '24 19:09 tomervazana

I ran into the same issue (#6144) and solved it by modifying the code that sets the default type. Nice solution as well!

kuds avatar Sep 24 '24 05:09 kuds

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Oct 24 '24 08:10 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. Please open a new issue for related bugs.

github-actions[bot] avatar Nov 07 '24 12:11 github-actions[bot]