verl icon indicating copy to clipboard operation
verl copied to clipboard

[Feat.] Separate Model Paths for Actor and Reference Models in actor_rollout_ref Configuration

Open ian00000 opened this issue 9 months ago • 1 comments

I'm currently experimenting with the KDRL (Unified Knowledge Distillation and Reinforcement Learning(https://arxiv.org/abs/2506.02208)) method. My goal is to train a student model (the "actor") guided by a more capable, larger teacher model (referred to as the reference model in the context of KL divergence calculations for distillation).

However, the current verl configuration for actor_rollout_ref seems to assume that the actor model and the reference model share the same model.path.

I'd like to propose adding a configuration option that allows for specifying a separate model path for the reference model within the actor_rollout_ref section.

ian00000 avatar Jul 22 '25 06:07 ian00000

https://github.com/volcengine/verl/pull/2050 May be relevant

DavidHarrison avatar Nov 20 '25 17:11 DavidHarrison