Li Ruixiao

Results 6 comments of Li Ruixiao

This seems to be related to the version of the CUDA driver. Have you tried upgrading the CUDA driver to a higher version?

For my configuration (triton 2.2.0, cuda 12.2, nvcc 11.8 on A100), this error is resolved by upgrading the nvidia driver from 470.57.02 to 535.104.05. the minimum version required by *cuda*...

Could you please provide some code snippets that can reliably reproduce the issue? 🙂

Any progress? I am quite interested in this `Shared Memory Usage` issue. Would it be possible for you to share your code?

maybe this could help? i'm not sure🤔 https://github.com/alibaba/Pai-Megatron-Patch/tree/main/megatron_patch/fixes/optimizer_offloading

same issues, using verl commit id `c5b189a1af496d0bc68320cd1d5bd7a1f1e3638a`