ASE what is the role of latent code?

In pre-training stage, latent code z is sampled according to the prior p(z), I noticed p(z) is Gaussian distribution. I am confused obout the "random" latent code. If z is sampled randomly sampled from p(z)，same skill motion may map with different z，meanwhile same or similar z may map with different skill motion. So after pre-training, if we randomly sample a latent code z from p(z), what motion does it imitate? walk? strike? or jump? thank you for your work, hope for your replay

Dec 21 '22 11:12 xjturobocon

Yes, the latents are sampled randomly during pre-training. Because of the objective used during pre-training, the model will learn to assign different behaviors to different latents automatically. This is similar to what happens in unsupervised reinforcement learning. We do not need to explicitly specify which skills a particular latent produces. Instead the GAN and unsupervised RL objective will automatically learn a skill embedding where different zs will be mapped to different behaviors that resemble the dataset. If you want more details, you can take a look at the paper for a more in-depth explanation.

Jan 10 '23 04:01 xbpeng

Yes, the latents are sampled randomly during pre-training. Because of the objective used during pre-training, the model will learn to assign different behaviors to different latents automatically. This is similar to what happens in unsupervised reinforcement learning. We do not need to explicitly specify which skills a particular latent produces. Instead the GAN and unsupervised RL objective will automatically learn a skill embedding where different zs will be mapped to different behaviors that resemble the dataset. If you want more details, you can take a look at the paper for a more in-depth explanation.

Thanks for your reply. Assuming the latent code is 1-dim(range from 0 to 1), for a specific skill, for example, 'jump', when performing 'jump' skill in high level stage, it should get continuously changing zs (0.1, 0.11, 0.12...)for sequential frames, right? because for the task policy network, the input is continuous, the output zs is also continuous. However, during pre-training, for 'jump' skill, I mentioned that for every 10 frames, the sequence amp_obs is mapped with a random zs, so the motion clip can be mapped to distinct zs(0.1, 0.5, 0.9), I think it may hard to generate 'jump' skill stably.

I guess an ideal map between zs and skill may be that a skill(motion clip) should map with a zs cluster, not discrete different zs. hope for your opinion!

Jan 10 '23 05:01 xjturobocon

sorry not sure if i understand your question.

Jan 13 '23 05:01 xbpeng

sorry i didn't ask clearly. What confused me is that after pre-training, is the latent space structured？Is it like the first picture or the second picture below? different color represents a specific skill. 不聚类

Jan 13 '23 06:01 xjturobocon

There is some structure in the latent space. Latents that are close in the latent space will typically correspond to similar behaviors.

Jan 17 '23 18:01 xbpeng

So how does it have the structure? In pre-training, every step, a latent code is sampled randomly, so there is no guarantee that similar behaviors correspond to similar latent codes, is that right?

Jan 18 '23 04:01 xjturobocon

yes that's right. The latents are sampled randomly during training, and there's no guarantee that similar behaviors correspond to similar latent codes. But in practice, we do see that similar latent codes often lead to similar behaviors. This is likely partly due to the smoothness of the function approximator and the mutual information objective.

Jan 24 '23 19:01 xbpeng