Daniel Paleka
Daniel Paleka
How complex is to generalize the code in the PR to logprobs > 1? Is it just having `token_logprobs` as a list of lists in inference.py?
Feedback from a conversation with @LouisCastricato and @Dahoas: This is not supported in the current TRLX version. The closest thing available is attributing the reward only to the first call...
> > > As of now, TRLX supports only RL setups where all "actions" to attribute the reward to are done before the reward function is called. > > @dpaleka,...
Aha, so if I specify `-o result`, it will create files `result00`, `result01`, ..., and if it finds something, the last one will be `resultXY`, containing the proof. Thanks, this...
To be honest, not sure how I hacked around it, I don't have access to the code anymore. Sorry!
Same behaviour in `0.1.16` as described in the issue, except of course it's now `Execute this command? [Y/n]`. One additional weird thing that happens: when I enter `exit` the first...