verl
verl copied to clipboard
logits scaling affect rollout_actor_probs_pearson_corr metrics
System Info
verl: latest
python: 3.10.6
Information
- [x] The official example scripts
- [x] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
When using VLLM for rollout, the returned rollout_log_probs are computed logits that are not scaled by temperature. If scaling is applied on the training side, it will lead to an incorrect value for the metric "rollout_actor_probs_pearson_corr." Generally, a correlation coefficient closer to 1 is considered to indicate better alignment between training and inference.
Using FSDP also has similar issues.
# verl/workers/actor/megatron_actor.py
def logits_processor(logits, label, label_mask):
assert logits.shape[:2] == label.shape[:2]
assert label.shape == label_mask.shape
logits.div_(temperature) #### this affect rollout_actor_probs_pearson_corr
ret = {}
if calculate_entropy:
logits_bak = logits.clone()
logger.warning_once(
"For memory-efficient computation, enable fused kernels via "
"`actor_rollout_ref.model.use_fused_kernels=True`. "
"The current `clone()` operation ensures correctness but increases memory usage."
)
entropy = vocab_parallel_entropy(logits)
ret["entropy"] = entropy
else:
logits_bak = logits
log_probs = vocab_parallel_log_probs_from_logits(logits_bak, label)
log_probs = log_probs.masked_fill(~label_mask, 0.0)
ret["log_probs"] = log_probs
return ret
Expected behavior
Training does not perform temperature scaling on logits, which keeps it aligned with inference, making the calculated rollout_actor_probs_pearson_corr meaningful.