Hubert Lu issues

Results 6 issues of


                                            Hubert Lu

Run JAX on multi-host platforms (such as TPUs)

Hi, I currently try to run my program (which initially works on a Cloud TPU v2-8/v3-8) on a Cloud TPU v2-32 which has 4 hosts by using JAX (jax==0.1.65, jaxlib==0.1.45)....

P0 (urgent)

NVIDIA GPU

Comparison of native AllReduce and compressed AllReduce in DeepSpeed

Hi, I tested the native AllReduce (deepspeed.comm.all_reduce) and the compressed AllReduce (backend.compressed_allreduce) in DeepSpeed with [this test script](https://github.com/microsoft/DeepSpeed/blob/master/tests/onebit/test_nccl_perf.py). On a ROCm system, we observed 414% performance improvement of switching from...

Enable custom AR for AMD GPUs and maintain it in sgl-kernel

## Motivation The current SGLang on AMD GPUs fails to leverage vLLM custom AR. To further remove the dependency on vLLM's custom AR in SGLang, we plan to maintain the...

[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs

## Motivation To enable aiter's fused allreduce kernel, please add `--enable-aiter-allreduce-fusion`. **With `--enable-aiter-allreduce-fusion`**: `void aiter::reduce_scatter_cross_device_store(aiter::RankData*, aiter::RankSignals, aiter::Signal*, int, int)` and `void aiter::local_device_load_rmsnorm_512n(aiter::RankSignals, __hip_bfloat16*, __hip_bfloat16*, __hip_bfloat16*, __hip_bfloat16*, float, int, int, int)`...

documentation

amd

[AMD] Add more tests to AMD CI and add diffusion dependencies

## Motivation Added more tests to AMD CI and diffusion dependencies along with a placeholder for diffusion-related test in AMD CI ## Modifications ## Accuracy Tests ## Benchmarking and Profiling...

amd

dependencies

run-ci

[AMD] Add 8-GPU MX35X test running DSR1-MXFP4 model for AMD CI

## Motivation Add an 8-GPU MI35X test to AMD CI which uses [amd/DeepSeek-R1-MXFP4-Preview](https://huggingface.co/amd/DeepSeek-R1-MXFP4-Preview) with and without speculative decoding (MTP). ## Modifications ## Accuracy Tests ## Benchmarking and Profiling ## Checklist...

amd

dependencies

deepseek

run-ci