mujoco icon indicating copy to clipboard operation
mujoco copied to clipboard

Domain randomization for MJX

Open alexhansson opened this issue 1 year ago • 7 comments

Hi,

I'm a student and I'm trying to use MuJoCo for training RL policies for space systems. According to the google collab notebook linked in the MuJoCo documentation, one can simulate and randomize multiple environments in parallel with this code:

rng = jax.random.PRNGKey(0)
rng = jax.random.split(rng, 4096)
batch = jax.vmap(lambda rng: mjx_data.replace(qpos=jax.random.uniform(rng, (1,))))(rng)

jit_step = jax.jit(jax.vmap(mjx.step, in_axes=(None, 0)))
batch = jit_step(mjx_model, batch)

However, this only varies the initial position of your object. I'm wondering if you can vary the environments more flexibly, like for example randomizing the location/dimension of collision objects in the scene. I know certain parameters can't be changed once the xml has been generated, but would it in theory be possible to have multiple xml files and "concatenate" them in some way for parallel simulation?

And as a side question, is it possible to visualize the parallel environments similarly to issacgym?

Thanks a lot for your help.

alexhansson avatar May 21 '24 22:05 alexhansson

Hi @alexhansson , in the same notebook, there is a demo varying other parameters in the model to train an RL policy with domain randomization. Search for "Training a Policy with Domain Randomization".

RE rendering: There isn't an out-of-the-box way to render parallel environments. Here are some relevant discussions (some of which you have already commented on)

https://github.com/google-deepmind/mujoco/issues/1682 https://github.com/google-deepmind/mujoco/issues/1604 https://github.com/google-deepmind/mujoco/issues/1356

btaba avatar May 21 '24 23:05 btaba

Thanks for answering @btaba

I have seen the "Training a Policy with Domain Randomization" section. However, only some parameters like friction and actuator gain/bias are randomized.
In our application we want to simulate multiple environments in parallel where we vary number, shape and size of collision objects which our agent has to maneuver. I see two ways how this could be possible (but I don't know if they work like this):

  1. Create a xml file parser and "combine" multiple, randomized environments (i.e. convert each xml in model and data object) in some sort of array for parallel simulation.
  2. Add properties like positions (but not shape or size) of collision objects spread across the environment.

Could you provide more insight on this? I'm also happy to explain this in more detail if needed.

RE rendering: Thanks for linking the relevant issues. I take from it that you're working on it, which is very cool.

Thanks for your time!

alexhansson avatar May 21 '24 23:05 alexhansson

Hi @alexhansson, varying shape and size of collision objects can be done similarly to the demo in the colab. See https://mujoco.readthedocs.io/en/stable/XMLreference.html#body-geom-size as an example. With meshes, this is a bit harder, but will be viable (working on a current commit), see the relevant issue: https://github.com/google-deepmind/mujoco/issues/1655

To vary the number of objects, [1] and [2] both sound like good ideas. For [1] you won't be able to stack Model's if they have different shapes, so you'll have to dispatch to each scene on the host. For [2] you'd load all the objects into the same environment, and "mask" objects you don't want to interact with at reset by potentially moving those objects out of the main scene. Hope this helps for now

btaba avatar May 21 '24 23:05 btaba

Hey @btaba Sounds good. So I could do the following: Create one environment with 1 agent and M collision objects/shape. Before running the episodes I would randomize the position, shape and size (so following [2]) for N parallel environments. And after that I am able to execute the envs in parallel with the M obstacles placed at randomized locations?

alexhansson avatar May 22 '24 00:05 alexhansson

Hey @alexhansson ,

Sounds good for resetting the object positions at environment reset time. The positions are part of the mjx.Data which is part of the environment State.

For randomizing the size/shape (esp. easy for primitives), I would do that with the domain randomization utility on geom_size, which modifies the mjx.Model (so that each parallelized env gets a different size). You can follow the example in the colab. Let us know if you have any trouble.

btaba avatar May 22 '24 00:05 btaba

Sounds good. Will reply in this thread if there are any issues down the line. Thanks!

Edit: One more question. With reset time, what do you mean exactly by that? Do you mean the custom method for when i call env.reset() or are you referring to a built-in one? @btaba

alexhansson avatar May 22 '24 00:05 alexhansson

Hey @alexhansson , by reset time, I mean in the environment reset function. The relevant example in the tutorial is how a sample_command is randomly sampled during the environment reset. Also notice how pipeline_state = self.pipeline_init(self._init_q, jp.zeros(self._nv)) sets the position of the joints during the reset, but you could equally add some noise to init_q and get random starting joint positions.

btaba avatar May 22 '24 00:05 btaba

Hi @btaba, you mentioned difficulties in varying meshes across parallel environments.

I want to simulate a dexterous manipulation task where there are multiple environments, each defined by a different object. The objects are geometrically complex and represented by meshes (each object may be composed of a varying number of convex submeshes that the hand can make contact with) - the stanford bunny is an example of one such object.

Is it possible to simulate different objects of this kind in parallel in mjx?

jamesheald avatar Jan 17 '25 22:01 jamesheald

Hi @jamesheald currently MuJoCo MJX does not support "heterogenous" models.

I can think of tricks where you spawn a Model with 5 meshes, and set the qpos differently in each environment s.t. some of the meshes are outside of the workspace and are "frozen" (repeatedly set the qpos after each physics step). I haven't tried this so YMMV.

We've been working on getting past JAX limitations, but no big updates yet.

btaba avatar Jan 17 '25 22:01 btaba

Hi @btaba. Thanks for the quick response.

So if I understand correctly, say I have 100 objects, the idea would be to have all 100 objects always present, but in each of the parallel environments, a different one of the 100 objects would be positioned in the workspace?

Is there a way to do this such that computation is not wasted on the potentially large number of extraneous objects?

jamesheald avatar Jan 17 '25 23:01 jamesheald

"Is there a way to do this such that computation is not wasted on the potentially large number of extraneous objects?"

You can set max_contact_points and max_geom_pairs if the objects are far enough from each other, some computation will be saved in collision detection. But not so for other parts of the engine unfortunately. This is mainly due to a limitation of JAX

btaba avatar Jan 17 '25 23:01 btaba

That's good to know. Thank you.

There is no way to specify an object as static, so that its state is not updated across time steps?

jamesheald avatar Jan 17 '25 23:01 jamesheald

For an object to be static, it needs to be part of the world body

btaba avatar Jan 17 '25 23:01 btaba

I guess this can't be done adaptively on environment reset?

jamesheald avatar Jan 17 '25 23:01 jamesheald

You'd have to recompile the model

btaba avatar Jan 17 '25 23:01 btaba

Ok I will experiment. Thank you so much for these helpful insights and pointers.

jamesheald avatar Jan 17 '25 23:01 jamesheald

This Thread has been a very valuable Resource so far.

I am currently working on Envs with fixed maximum amount of M collision objects. Where at each step only fixed amount of Objects are added.

By applying the trick i "mask" the unseen collision objects in reset, then "unmask" at step. Is there any Good method to prevent the Computational Overhead of calculating the "masked" objects, that dont change in qpos ?

@btaba you mentioned something:

outside of the workspace and are "frozen"

Is there maybe a way to define a max x,y,z "cube" that should be considered for collision Computation and anything not inside of this cube, will just be "skipped" ?

MoritzSchwerdt avatar Feb 13 '25 14:02 MoritzSchwerdt

I'm also having trouble randomizing the position of a static object in MJX. I'm able to randomize movable objects by using an extra joint / free joint, but there doesn't seem to be a clean way to do this for static objects, since MJX prevents directly setting the object position at reset.

Is there a simple workaround for this? Specifically, I'm aiming to randomize the position of a door at reset.

fionalluo avatar Aug 10 '25 03:08 fionalluo

Hi @fionalluo , not sure if this still relevant for you, but I managed to get this to work using mocap bodies, here is an overview: mocap bodies in mujoco. Those bodies can be moved (position is part of the state) but cannot be moved by physics (not affected by gravity and not moved when in collision). Maybe this helps?

finnBsch avatar Sep 21 '25 09:09 finnBsch