Daniel Paleka comments

Results 6 comments of


                                            Daniel Paleka

Support logprob in OpenAI API

How complex is to generalize the code in the PR to logprobs > 1? Is it just having `token_logprobs` as a list of lists in inference.py?

How to attribute reward to multiple model runs in the same trajectory with PPO

Feedback from a conversation with @LouisCastricato and @Dahoas: This is not supported in the current TRLX version. The closest thing available is attributing the reward only to the first call...

How to attribute reward to multiple model runs in the same trajectory with PPO

> > > As of now, TRLX supports only RL setups where all "actions" to attribute the reward to are done before the reward function is called. > > @dpaleka,...

Exit proofgame after it finds a list of moves leading to a position

Aha, so if I specify `-o result`, it will create files `result00`, `result01`, ..., and if it finds something, the last one will be `resultXY`, containing the proof. Thanks, this...

Cannot supply enabled_if parameters in CLI

To be honest, not sure how I hacked around it, I don't have access to the code anymore. Sorry!

Command doesn't run after "Do you want to continue? [y/N]: y"

Same behaviour in `0.1.16` as described in the issue, except of course it's now `Execute this command? [Y/n]`. One additional weird thing that happens: when I enter `exit` the first...