cmunley1

Results 18 comments of cmunley1

above is actually reward hacking by calling more and more tools, changing reward structure to exact match.

I think we could just treat as system message

Unsloth currently [does not support custom rollout function](https://github.com/unslothai/unsloth/issues/3573) in their patched version of TRL GRPOTrainer it seems, making it difficult to fully use NeMo Gym as a rollout tool. We...

Hey @mmathew23 do you have a timeline for custom rollout function? For vllm server mode, I think that operating like trl is sufficient, but an async vllm engine with openai...

TRL has a [custom rollout function](https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L214) and [vllm server mode](https://github.com/huggingface/trl/blob/main/trl/scripts/vllm_serve.py) that makes the integration easier. The vllm server is not a typical AsyncLLMEngine, it does not have openai chat completions/responses...

I took a stab at this [here](https://github.com/NVIDIA-NeMo/Gym/compare/main...cmunley1/reload ) It seems to work but not tested extensively

@zhiqihuang I am having the same issue, removing device_map auto did not solve it, did you find a solution? I am following the guide here: https://huggingface.co/docs/trl/main/en/vllm_integration but having the same...