Augusto Tagle comments

Repositories
Issues
Comments

Results 1 comments of


                                            Augusto Tagle

Why choose total reward of entire trajectory as label?

Hi! I think conditioning on the total trajectory reward is done to sample actions that not only attempts to maximize the reward over the current sampling horizon but also aim...