Devanshu Shah
Devanshu Shah
@Ram81 @RishabhJain2018 Is this something you would want me to implement? Sorry, I was not sure if I should tag you or not, I understand that you would be busy....
Hi! Sure, I am not working on it. Although I am not sure if the team wants it. So best to ask them :)
Hi! I do not have much experience in distributed algorithms but I really like them and am learning them. I think it'll be really great if I could work on...
Thanks for the reply @sdesrozis ! To confirm if I am getting it correctly, we want to use `ray.tune` and other distributed utilities provided by ray and see how it...
Hi! Could you give a brief idea as to what should be included in those files?
I'd like to write the docs for VPG, if that's okay
What is the definition of `timestep` which is displayed on the console during training? Is it the time from the start of training (I think it looked like it), or...
```python import gym from genrl import VPG from genrl.deep.common import OnPolicyTrainer from genrl.environments import VectorEnv env = VectorEnv("CartPole-v1") agent = VPG('mlp', env) trainer = OnPolicyTrainer(agent, env, epochs=1000) trainer.train() ``` I...
> At the end, add a trainer.evaluate(). That will make sure that the greedy policy is followed each time. Should give 500. Okay but from what I understand, it'll use...
Oh okay, thanks! I'll look into the implementation again. My question was even if it is following a stochastic policy the policy should improve over time(over the course of time...