dv3 Imagination notebook re-visited
I can't find a button to reopen the issue, regarding #272, I have two questions!
- shouldn't initial
stochastic_statebe flattened s.t.
stochastic_state = player.stochastic_state.view(1, 1, -1).clone()
in
if i == initial_steps - imagination_steps - 1:
stochastic_state = player.stochastic_state.clone()
recurrent_state = player.recurrent_state.clone()
- shouldn't 0.5 also be added to imagined reconstructed obs? s.t.
step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy() + 0.5
in
# imagination step
stochastic_state, recurrent_state = world_model.rssm.imagination(stochastic_state, recurrent_state, actions)
# update current state
imagined_latent_states = torch.cat((stochastic_state.view(1, 1, -1), recurrent_state), -1)
rec_obs = world_model.observation_model(imagined_latent_states)
step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy()
- Also -0.5 is missing here. This is quite critical in the test phase. when testing with the notebook, I couldn't get similar return seen at training, and this was the main cause
preprocessed_obs[k] = preprocessed_obs[k] / 255.0 - 0.5
in
preprocessed_obs[k] = torch.as_tensor(v[np.newaxis], dtype=torch.float32, device=fabric.device)
if k in cfg.algo.cnn_keys.encoder:
preprocessed_obs[k] = preprocessed_obs[k] / 255.0
mask = {k: v for k, v in preprocessed_obs.items() if k.startswith("mask")}
- shouldn't below fixes should be made to align buffers' timeline? I'm a bit confused in this though. I think -1 for actions is correct,
# actions actually played by the agent
actions = torch.tensor(
rb_initial["actions"][-imagination_steps + i - 1], # (anthony) subtract 1
device=fabric.device,
dtype=torch.float32,
)[None]
I think the original -imagination_steps + i is correct with -1 for actions, but instead I got desired trajectory with -imagination_steps + i + 1... confusing
# reconstruct the observations from the latent states used when interacting with the environment
played_latent_states = torch.cat(
(
torch.tensor(rb_initial["stochastic_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
torch.tensor(rb_initial["recurrent_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
),
-1,
)
With the changes above, now I got the desired performance & results!
Hi @anthony0727,
- The player has the stochastic state that is flattened, as you can see here: https://github.com/Eclectic-Sheep/sheeprl/blob/40035066a55b76fd9f9dc4d92ee5a749e079e6b1/sheeprl/algos/dreamer_v3/agent.py#L655 or https://github.com/Eclectic-Sheep/sheeprl/blob/40035066a55b76fd9f9dc4d92ee5a749e079e6b1/sheeprl/algos/dreamer_v3/agent.py#L686
- You are right, the reconstructed observation should be incremented by 0.5
- Same as
2., we modified Dreamer and these two modifications escaped us. -
+1is correct because there is a mismatch between actions and latent states (due to the inclusion of the initial latent state in the buffer). The problem with+1is that the last action is the one in position0.
I will fix these problems as soon as possible, thanks for reporting these.
EDIT:
4. As it stands, latent states at index i are generated by the observation at index i-1 and are used to generate the action at index i-1. This is because the initial latent state is buffered. I will modify it because it only creates confusion.
@anthony0727 @michele-milesi thank you both!
Effectively we save the stochastic and recurrent state that refers to the previous action instead of the one that has been generated
Hi @anthony0727, we created a branch for fixing this issue, can you check if it works? (https://github.com/Eclectic-Sheep/sheeprl/tree/fix/dv3-imagination-notebook)
Thanks
I'll soon look into it & new dv3 after my deadline thing!!