Jack Lanchantin

Results 7 issues of Jack Lanchantin

**What does this PR do? Please describe:** A summary of the change or the issue that is fixed. Allows for logging rollouts during online training and validation add the following...

CLA Signed

**What does this PR do? Please describe:** A summary of the change or the issue that is fixed. Fixes #{issue number} **Does your PR introduce any breaking changes? If yes,...

CLA Signed

**What does this PR do? Please describe:** - [ ] Adds GrpoLossConfig `adv_std_normarlization` (for DrGRPO) - [ ] Adds GrpoLossConfig `loss_token_mean` for normalizing over all tokens - [ ] Adds...

CLA Signed

**What does this PR do? Please describe:** A summary of the change or the issue that is fixed. Removes unused vllm function import (Worker) which was removed in v0.11.0 **Does...

CLA Signed

**What does this PR do? Please describe:** A summary of the change or the issue that is fixed. Fixes #{issue number} **Does your PR introduce any breaking changes? If yes,...

CLA Signed

**What does this PR do? Please describe:** A summary of the change or the issue that is fixed. Fixes #{issue number} right now, we pass all prompts to the LLM...

CLA Signed

**What does this PR do? Please describe:** - Adds logging entropy for chosen and rejected sequences separately in online DPO training. - Few other small changes **Check list:** - [x]...

CLA Signed