Michael Volpp

Results 3 comments of Michael Volpp

Hi, thank you for your question! The implementation uses `n_workers` parallel processes to sample objective functions and generate transitions using the actions coming from the policy. Each worker has it's...

Hi, - Yes, one task corresponds to one objective function for optimization. - Yes, as you want to evaluate only one task you set n_episodes = 1 (in RL-terms, one...

Hi vamp-ire-tap! Thank you for your question! - In our paper we decided to switch off as many confounding factors as possible and therefore optimized the GP hyperparameters offline for...