Zichun Yu issues

Repositories
Issues
Comments

Results 2 issues of


                                            Zichun Yu

PPO implementation

Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!

Question about the initialization of the perdomain_scores

Hi, I noticed that the perdomain_scores are initialized with np.log(len(tokenizer)). Is it because you assume that the random model will generate a uniform distribution over the vocabulary? Thank you! https://github.com/sangmichaelxie/doremi/blob/7cde52d1848737aa967ecbdb9e643cf334de160d/doremi/train.py#L273