Reproduce results with RoboTurk dataset
Could you please provide the hyperparameters and instructions to learn options with RoboTurk dataset?
By the way, where is the "SaywerViz" environment?
https://github.com/facebookresearch/CausalSkillLearning/blob/d3f2006498cea8104501217df725970334bb5601/Experiments/Visualizers.py#L28
Hi @YeeCY,
Thanks for your interest in the repository! We're in the process of adding instructions on how to set up the additional visualization environments ("SawyerViz" and "BaxterViz"), as well as how to run pre-training and full skill training.
Please keep an eye out for changes to the readme of the repo for these instructions - they should be out this week.
#8 provides instructions about how to setup the repository, and the "SawyerViz" environment. Here is a quick link to the instruction document.
If you'd like to proceed without the SawyerViz environment, you can use any of the existing Sawyer environments in Robosuite. (Just that the skills may be interrupted by the table / objects, and the robot trajectory may be occluded by them).
Thanks for your feedback, @tanmayshankar! I will check the incoming updates, try to reproduce results in the paper, and report issues if any. Consider close this issue when a reproducible version is available.
After reading your code, I find the implementation of variational network for termination prediction is not the same as the one described in the paper. So I have some questions
https://github.com/facebookresearch/CausalSkillLearning/blob/b840101102017455d79a4e6bfa21af929c9cf4de/Experiments/PolicyNetworks.py#L1150
What are the variance_factor, variance_activation_bias, epsilon doing above?
In addition
https://github.com/facebookresearch/CausalSkillLearning/blob/b840101102017455d79a4e6bfa21af929c9cf4de/Experiments/PolicyNetworks.py#L1142
The b_probability_factor is confusing as well.
https://github.com/facebookresearch/CausalSkillLearning/blob/b840101102017455d79a4e6bfa21af929c9cf4de/Experiments/PolicyNetworks.py#L1173
Plus, why we need a prior value to predict termination, and what is the meaning of the outputs from termination_output_layer since you just add them with prior before activation.
https://github.com/facebookresearch/CausalSkillLearning/blob/b840101102017455d79a4e6bfa21af929c9cf4de/Experiments/PolicyNetworks.py#L1176
Hi @YeeCY ,
Here are the answers to your questions -
-
Epsilon in this case is the minimum variance that we allow the Gaussian distribution to take on. It is possible the network by itself learns to predict near zero variances, which can cause irregularities in the gradients of losses with respect to the distribution. This epsilon prevents that.
The variance factor simply scales the variance prediction by some constant term. This helps initial values of the variance predicted by the network be more varied, since otherwise the small changes in the network's parameters don't lead to a large change in the variance predicted. The variance activation bias is simply set to 0 here, but was used in exploring other activation functions that predict negative values by themselves. When added, the bias forces the variance to be non-negative. -
The b_probability_factor is another learning trick used to encourage the network to predict reasonable values during initial phases of training. By setting b_probability_factor to a value like 0.01 or so, this branch of the network receives smaller gradients and therefore changes more slowly, allowing the prior we impose on the b_probability prediction to bias learning.
-
The prior values help bias the termination probability values to reasonable skill lengths. Without it, there are two degenerate cases that can arise - either the skills are all of length 1 timestep, or the entire trajectory is encoded as a single skill. The prior helps avoid this. The termination_output_layer is simply an activation layer (either a tanh, relu, etc.), that we explored various configurations of. Think of the probability prediction as just some addition of a prediction function from a network (the result of the termination_output_layer) + a handcrafted prior term.
@tanmayshankar Thanks for your reply. I will double-check the code. While I run the code with the commands in your README.md, it produces gifs in the tensorboard logs. The variational rollout works as expected to reconstruct the whole episode, but the latent rollout just stays at around the initial arm pose with somewhat turbulence. Any idea?
Hmm that's unusual - are you running the joint training on the MIME or Roboturk dataset? One possible reason I can think of for this is - The joint training waits for something like 200k steps before training the latent policy, to ensure it has a well trained variational encoder before training the latent policy, because the variational encoder provides "supervision" to the latent policy. If the latent policy hasn't been trained yet, it's expected that the rollout just stays near the initial pose, because it is predicting useless skills.
That said, I am not sure of why this is happening - let me see if I can replicate this behavior.