Max comments

Results 33 comments of

Max

Add Colab notebooks with examples

Hi @mrm8488, that would be great! @LouisCastricato There were few issues back, but I think they are no longer relevant since week ago colab upgraded to 3.8, same version as...

Large reward model issue.

It might be possible to initialize the reward model seperately with zero inference[1]. Afaik accelerate by itself doesn't support second dsconfig. [1] https://www.deepspeed.ai/2022/09/09/zero-inference.html

Large reward model issue.

Hey @RobertKirk, for larger reward models you can adopt code at https://github.com/CarperAI/trlx/blob/main/examples/hh/ppo_hh.py#L113-L183. With it you either host a reward model using triton server's gRPC endpoint or dedicate a separate GPU...

CUDA OOM with large prompt length

@LouisCastricato yes with #156

How to implement a conditional reward?

Hello! The reward function currently expects the whole `sample = prompt + output` as input, so in the case of summarization you could split it over "TL;DR" to recover prompts,...

Ray Tune sweep does not support multi GPU

Hi @xwjiang2010 thanks for dropping it! In short we're lacking knowledge how to do hyperparameter optimization with Tune in the least invasive way to our codebase. Currently we have a...

Ray Tune sweep does not support multi GPU

Hey thanks for giving us some pointers! Seems like the most that could be achieved with ray train would be DDP handled internally by ray. @ayulockin Maybe a more lightweight...

Example/Test Model Benchmarks (Canonical WandB runs)

@cat-state something like that? https://github.com/vwxyzjn/cleanrl/pull/307

Example/Test Model Benchmarks (Canonical WandB runs)

> Sure, although you might need compute? @reciprocated maybe we should make a single-node config version that can be finetuned on a single gpu fast? fwiw {ppo,ilql}_config.yml were meant to...

Example/Test Model Benchmarks (Canonical WandB runs)

Resolved with https://github.com/CarperAI/trlx/pull/357