if it is possible to conduct RLHF from env

Open SHITIANYU-hue opened this issue 2 years ago • 1 comments

Thanks for open-sourced agentTuning code , I am quite interested in training the model, i see the training framework is not open-sourced https://github.com/THUDM/AgentTuning/issues/1,

The discussion mentioned that it could support ptuning or LORA, i am also wondering if it could also support RLHF?

Recently, i read a paper: https://arxiv.org/abs/2312.14878, i am curious how the AgentLM performance would be if we could let it learn from interacting with environments. (refer to Finetune type II in that paper)

Jan 15 '24 23:01 SHITIANYU-hue

We haven't integrated RLHF methods into AgentTuning yet and we won't be releasing related experimental results recently. I believe that would be an awesome thing to try out.

Jan 20 '24 07:01 Btlmd