Videh Raj Nema
Results
3
comments of
Videh Raj Nema
Yes, please. I look for the same.
Thanks, @alexis-jacq. The variance due to the Monte-Carlo rollouts is very high and I think using better advantage estimators can make the algorithm more robust. Unfortunately, the LOLA-DiCE objective, by...
> Thank you very much for releasing the code. It looks like the current code only supports scripted teachers. Is there any plan to also release the part to support...