Zichun Yu
Results
2
issues of
Zichun Yu
Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!
Hi, I noticed that the perdomain_scores are initialized with np.log(len(tokenizer)). Is it because you assume that the random model will generate a uniform distribution over the vocabulary? Thank you! https://github.com/sangmichaelxie/doremi/blob/7cde52d1848737aa967ecbdb9e643cf334de160d/doremi/train.py#L273