[Feat.] Separate Model Paths for Actor and Reference Models in actor_rollout_ref Configuration
I'm currently experimenting with the KDRL (Unified Knowledge Distillation and Reinforcement Learning(https://arxiv.org/abs/2506.02208)) method. My goal is to train a student model (the "actor") guided by a more capable, larger teacher model (referred to as the reference model in the context of KL divergence calculations for distillation).
However, the current verl configuration for actor_rollout_ref seems to assume that the actor model and the reference model share the same model.path.
I'd like to propose adding a configuration option that allows for specifying a separate model path for the reference model within the actor_rollout_ref section.
https://github.com/volcengine/verl/pull/2050 May be relevant