What's the meaning of modalities in MUJOCO PUSH dataset?
Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned
The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.
I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).
Someone else can confirm, but here's how I think of things: -> The "image" modality refers to the gray-scale images. -> The "pos" modality refers to the 3d position of the end-effector. -> The "sensor" refers to the forces/binary contact information. -> The "control" refers to what the controller is sending the arm itself. ( This one I'm the least sure about ).
I agree with your ideas, but this does not seem to correspond to the paper? For example, Figure 8.