Videh Raj Nema

Results 3 comments of Videh Raj Nema

Thanks, @alexis-jacq. The variance due to the Monte-Carlo rollouts is very high and I think using better advantage estimators can make the algorithm more robust. Unfortunately, the LOLA-DiCE objective, by...

> Thank you very much for releasing the code. It looks like the current code only supports scripted teachers. Is there any plan to also release the part to support...