MultiBench What's the meaning of modalities in MUJOCO PUSH dataset?

Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned

The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.

I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).

May 26 '22 13:05 mrbeann

Someone else can confirm, but here's how I think of things: -> The "image" modality refers to the gray-scale images. -> The "pos" modality refers to the 3d position of the end-effector. -> The "sensor" refers to the forces/binary contact information. -> The "control" refers to what the controller is sending the arm itself. ( This one I'm the least sure about ).

May 27 '22 18:05 arav-agarwal2

I agree with your ideas, but this does not seem to correspond to the paper? For example, Figure 8.

May 28 '22 02:05 mrbeann