sheeprl icon indicating copy to clipboard operation
sheeprl copied to clipboard

dv3 Imagination notebook re-visited

Open anthony0727 opened this issue 1 year ago • 4 comments

I can't find a button to reopen the issue, regarding #272, I have two questions!

  1. shouldn't initial stochastic_state be flattened s.t.
stochastic_state = player.stochastic_state.view(1, 1, -1).clone()

in

    if i == initial_steps - imagination_steps - 1:
        stochastic_state = player.stochastic_state.clone()
        recurrent_state = player.recurrent_state.clone()
  1. shouldn't 0.5 also be added to imagined reconstructed obs? s.t.
        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy() + 0.5

in

 # imagination step
        stochastic_state, recurrent_state = world_model.rssm.imagination(stochastic_state, recurrent_state, actions)
        # update current state
        imagined_latent_states = torch.cat((stochastic_state.view(1, 1, -1), recurrent_state), -1)
        rec_obs = world_model.observation_model(imagined_latent_states)
        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy()

anthony0727 avatar May 12 '24 11:05 anthony0727

  1. Also -0.5 is missing here. This is quite critical in the test phase. when testing with the notebook, I couldn't get similar return seen at training, and this was the main cause
                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 - 0.5

in

 preprocessed_obs[k] = torch.as_tensor(v[np.newaxis], dtype=torch.float32, device=fabric.device)
            if k in cfg.algo.cnn_keys.encoder:
                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 
        mask = {k: v for k, v in preprocessed_obs.items() if k.startswith("mask")}
  1. shouldn't below fixes should be made to align buffers' timeline? I'm a bit confused in this though. I think -1 for actions is correct,
            # actions actually played by the agent
            actions = torch.tensor(
                rb_initial["actions"][-imagination_steps + i - 1], # (anthony) subtract 1
                device=fabric.device,
                dtype=torch.float32,
            )[None]

I think the original -imagination_steps + i is correct with -1 for actions, but instead I got desired trajectory with -imagination_steps + i + 1... confusing

        # reconstruct the observations from the latent states used when interacting with the environment
        played_latent_states = torch.cat(
            (
                torch.tensor(rb_initial["stochastic_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
                torch.tensor(rb_initial["recurrent_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
            ),
            -1,
        )

anthony0727 avatar May 12 '24 12:05 anthony0727

With the changes above, now I got the desired performance & results!

anthony0727 avatar May 12 '24 12:05 anthony0727

Hi @anthony0727,

  1. The player has the stochastic state that is flattened, as you can see here: https://github.com/Eclectic-Sheep/sheeprl/blob/40035066a55b76fd9f9dc4d92ee5a749e079e6b1/sheeprl/algos/dreamer_v3/agent.py#L655 or https://github.com/Eclectic-Sheep/sheeprl/blob/40035066a55b76fd9f9dc4d92ee5a749e079e6b1/sheeprl/algos/dreamer_v3/agent.py#L686
  2. You are right, the reconstructed observation should be incremented by 0.5
  3. Same as 2., we modified Dreamer and these two modifications escaped us.
  4. +1 is correct because there is a mismatch between actions and latent states (due to the inclusion of the initial latent state in the buffer). The problem with +1 is that the last action is the one in position 0.

I will fix these problems as soon as possible, thanks for reporting these.

EDIT: 4. As it stands, latent states at index i are generated by the observation at index i-1 and are used to generate the action at index i-1. This is because the initial latent state is buffered. I will modify it because it only creates confusion.

michele-milesi avatar May 13 '24 08:05 michele-milesi

@anthony0727 @michele-milesi thank you both! Effectively we save the stochastic and recurrent state that refers to the previous action instead of the one that has been generated

belerico avatar May 13 '24 08:05 belerico

Hi @anthony0727, we created a branch for fixing this issue, can you check if it works? (https://github.com/Eclectic-Sheep/sheeprl/tree/fix/dv3-imagination-notebook)

Thanks

michele-milesi avatar May 20 '24 10:05 michele-milesi

I'll soon look into it & new dv3 after my deadline thing!!

anthony0727 avatar May 23 '24 14:05 anthony0727