IsaacLab icon indicating copy to clipboard operation
IsaacLab copied to clipboard

[Bug Report] ` --checkpoint` broken

Open VladimirFokow opened this issue 1 year ago • 2 comments

Describe the bug

Here: https://isaac-orbit.github.io/orbit/source/setup/sample.html#reinforcement-learning

In this example:

./orbit.sh -p source/standalone/workflows/rsl_rl/play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --checkpoint /PATH/TO/model.pth

we are supposed to provide /PATH/TO/model.pth

  • First of all, in my logs, the models with .pt extension are saved, not .pth

  • Secondly, it would be nice to write in the docs how to find this path, so that there's no need for the user to go looking for the logic into the code. (is the path to model the same as path to logs, described on that page?)

  • Thirdly, most importantly, it's completely broken even if you provide the path:

---------------------------------
Traceback (most recent call last):
  File "/home/fokow/work/Orbit/source/standalone/workflows/rsl_rl/play.py", line 106, in <module>
    main()
  File "/home/fokow/work/Orbit/source/standalone/workflows/rsl_rl/play.py", line 74, in main
    resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint)
  File "/home/fokow/work/Orbit/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/parse_cfg.py", line 209, in get_checkpoint_path
    raise ValueError(f"No checkpoints in the directory: '{run_path}' match '{checkpoint}'.")
ValueError: No checkpoints in the directory: '/home/fokow/work/Orbit/logs/rsl_rl/franka_reach/2024-03-15_21-08-21' match '/home/fokow/work/Orbit/logs/rsl_rl/franka_reach/2024-03-15_21-08-21'.

In this line the checkpoint is the full path of the checkpoint that was provided, that's why no matches are found.

System Info

  • Commit: 475b3f7

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)
  • [x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Related issue was discussed here, where they said:

Documentation is not the same as code. It does get a bit confusing.

Acceptance Criteria

This task is considered done if:

  • the function get_checkpoint_path works
  • the examples in the docs work (that use the --checkpoint flag)
  • the docs describe where to find this path

VladimirFokow avatar Mar 15 '24 20:03 VladimirFokow

What should the logic be:

  • the latest checkpoint automatically selected,
  • or exactly the one specified by the user?

the code attempts to do both 😑 and achieves none.

VladimirFokow avatar Mar 16 '24 12:03 VladimirFokow

  1. You are right about pth. That is an error.
  2. You need to specify the "load_run" and the "checkpoint" file (as the args say). The paths are resolved automatically
./orbit.sh -p source/standalone/workflows/rsl_rl/play.py --task Isaac-Rough-Anymal-C-Play-v0 --load_run 2024-03-11_16-11-38 --checkpoint model_300.pt

I agree. This is not very intuitive. Feel free to send an MR with the required clarifications :)

Mayankm96 avatar Mar 19 '24 09:03 Mayankm96