[REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton > 2.1
Hi, we are looking at deepspeed.ops.sparse_attention and find out that current SA is based on triton==1.0.0, which is old version. Current triton is 2.x and our supported version is 2.x. May I know if there is any plan on upgrading triton version to 2.x and maintain the sparse attention kernel?
My error stack is mainly on deepspeed.ops.sparse_attention.matmul:
import triton._C.libtriton as libtriton
segmented = libtriton.superblock(layout.data_ptr(),
layout.shape[0],
layout.shape[1],
layout.shape[2],
start_width)
Thanks!
Is https://github.com/microsoft/DeepSpeed/pull/4071 related to this request?
Is #4071 related to this request?
Yes, but besides changing triton version, kernel needs updates as well.
Is #4071 related to this request?
Yes, but besides changing triton version, kernel needs updates as well.
Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?
got
File "python3.9/site-packages/deepspeed/ops/sparse_attention/matmul.py", line 276, in make_sdd_lut segmented = libtriton.superblock(layout.data_ptr(), layout.shape[0], layout.shape[1], layout.shape[2], AttributeError: module 'triton._C.libtriton' has no attribute 'superblock'
Is #4071 related to this request?
Yes, but besides changing triton version, kernel needs updates as well.
Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?
Yes, it is. Infact Triton has dropped support for ops in 2.0.
Reference: https://github.com/openai/triton/issues/1395#issuecomment-1483725777