InferenceMAX Enable GPTOSS GB200 DISAGG

This MR enables disaggregation for GPTOSS on GB200.

Modified files to add GPTOSS to Disagg runners and workflow.

Successful tests here: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19353241086/job/55369372877

Nov 14 '25 20:11 jgangani

thanks for this contribution @jgangani

Can you explain what this means? is all of the datapoints just 4 gpus for prefill only and then 4 gpus for decode only? if not, can u explain the parallelism config & the conc for each datapoint?

/submit_disagg.sh mtp=off tp 1 1 1 512 20000 "0.9" 0 0 "128 256 512"
                    ./submit_disagg.sh mtp=off tp 1 1 2 1024 20000 "0.9" 0 0 "64 128 256"
                    ./submit_disagg.sh mtp=off tep 1 1 2 1024 20000 "0.9" 0 0 "64 256"
                    ./submit_disagg.sh mtp=off tp 1 1 4 2048 20000 "0.9" 0 0 "8 16 32 64 128"
                    ./submit_disagg.sh mtp=off tp 1 1 8 2048 20000 "0.9" 0 0 "1 2 4 8 16"

Nov 14 '25 21:11 functionstackx

also @jgangani please merge this in main branch/release candidate instead of doing an side branch https://github.com/ai-dynamo/dynamo/compare/release/0.5.1-rc0.20251105...jthomson04/gpt-oss-disagg-slurm

Nov 14 '25 22:11 functionstackx

thanks for this contribution @jgangani

Can you explain what this means? is all of the datapoints just 4 gpus for prefill only and then 4 gpus for decode only? if not, can u explain the parallelism config & the conc for each datapoint?
/submit_disagg.sh mtp=off tp 1 1 1 512 20000 "0.9" 0 0 "128 256 512"
                    ./submit_disagg.sh mtp=off tp 1 1 2 1024 20000 "0.9" 0 0 "64 128 256"
                    ./submit_disagg.sh mtp=off tep 1 1 2 1024 20000 "0.9" 0 0 "64 256"
                    ./submit_disagg.sh mtp=off tp 1 1 4 2048 20000 "0.9" 0 0 "8 16 32 64 128"
                    ./submit_disagg.sh mtp=off tp 1 1 8 2048 20000 "0.9" 0 0 "1 2 4 8 16"

Following is the order:
<gen_server_config> <ctx_num> <gen_num_servers> <gen_tp_size> <gen_bs <gen_max_num_tokens>. 1 gpu for prefill. 2/4/8 for decode.

Nov 14 '25 22:11 jgangani

also @jgangani please merge this in main branch/release candidate instead of doing an side branch https://github.com/ai-dynamo/dynamo/compare/release/0.5.1-rc0.20251105...jthomson04/gpt-oss-disagg-slurm

Yes, that was the goal. wanted to test out the MR before merging this into release branch. Will update.

Nov 14 '25 22:11 jgangani

@jgangani thanks! Can u please enable 1k/8k and 1k/1k on gptoss gb200 in this PR too? Thanks!

Nov 14 '25 23:11 functionstackx

@functionstackx Switched to dynamo release branch.

Nov 15 '25 03:11 jgangani

@jgangani thanks! Can u please enable 1k/8k and 1k/1k on gptoss gb200 in this PR too? Thanks!

I am working on 1k1k DISAGG pareto configs next. 1k8k DISAGG probably will be on par with AGG since it is predominantly doing just decode. Hence, I recommend we merge this MR first. does it make sense?

Nov 16 '25 19:11 jgangani

if u can submit gb200 agg for 1k/8k in this PR too

Nov 16 '25 20:11 functionstackx

we're gonna hold off on this til #251 gets merged this week

Dec 03 '25 21:12 cquil11

@jgangani so sorry brother but can you please rebase with main following the convention set forth in https://github.com/InferenceMAX/InferenceMAX/pull/251 ?

Dec 07 '25 21:12 cquil11

Yes, I am working on it. Will open another MR based off post-251 merge.

Dec 07 '25 21:12 jgangani

@jgangani hi! where are we on this?

Dec 17 '25 00:12 cquil11

@jgangani hi! where are we on this?

GB200 DISAGG for 8k1k is ready with refactored code. I can create an MR right away if need be. Still working through 1k1k config exploration. I will need few more days for 1k1k

Dec 17 '25 07:12 jgangani