cmunley1 comments

Results 18 comments of


                                            cmunley1

Salesforce xlam-function-calling-60k resources server

above is actually reward hacking by calling more and more tools, changing reward structure to exact match.

Salesforce xlam-function-calling-60k resources server

Explicitly don't support Responses API instructions

I think we could just treat as system message

Unsloth Integration

Unsloth currently [does not support custom rollout function](https://github.com/unslothai/unsloth/issues/3573) in their patched version of TRL GRPOTrainer it seems, making it difficult to fully use NeMo Gym as a rollout tool. We...

Unsloth Integration

Hey @mmathew23 do you have a timeline for custom rollout function? For vllm server mode, I think that operating like trl is sufficient, but an async vllm engine with openai...

feat: TRL Integration

TRL has a [custom rollout function](https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L214) and [vllm server mode](https://github.com/huggingface/trl/blob/main/trl/scripts/vllm_serve.py) that makes the integration easier. The vllm server is not a typical AsyncLLMEngine, it does not have openai chat completions/responses...

Hot reload enabled for native servers

I took a stab at this [here](https://github.com/NVIDIA-NeMo/Gym/compare/main...cmunley1/reload ) It seems to work but not tested extensively

[GRPO] How to train model using vLLM and model parallelism on one node?

@zhiqihuang I am having the same issue, removing device_map auto did not solve it, did you find a solution? I am following the guide here: https://huggingface.co/docs/trl/main/en/vllm_integration but having the same...