verl
verl copied to clipboard
Support Training with Both Function-Based Reward and DPO Reward Simultaneously
Hello,
I would like to confirm whether the current implementation supports training with both function-based reward and DPO reward simultaneously. If not, are there any planned updates or workarounds to achieve this?
Thank you!