DeepSpeed [REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton

Hi, we are looking at deepspeed.ops.sparse_attention and find out that current SA is based on triton==1.0.0, which is old version. Current triton is 2.x and our supported version is 2.x. May I know if there is any plan on upgrading triton version to 2.x and maintain the sparse attention kernel?

My error stack is mainly on deepspeed.ops.sparse_attention.matmul:

import triton._C.libtriton as libtriton

segmented = libtriton.superblock(layout.data_ptr(),
                                         layout.shape[0],
                                         layout.shape[1],
                                         layout.shape[2],
                                         start_width)

Thanks!

Dec 21 '23 07:12 YizhouZ

Is https://github.com/microsoft/DeepSpeed/pull/4071 related to this request?

Dec 21 '23 07:12 delock

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

Dec 21 '23 08:12 YizhouZ

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?

Dec 21 '23 15:12 delock

got File "python3.9/site-packages/deepspeed/ops/sparse_attention/matmul.py", line 276, in make_sdd_lut segmented = libtriton.superblock(layout.data_ptr(), layout.shape[0], layout.shape[1], layout.shape[2], AttributeError: module 'triton._C.libtriton' has no attribute 'superblock'

Dec 22 '23 09:12 A-Cepheus

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?

Yes, it is. Infact Triton has dropped support for ops in 2.0.

Reference: https://github.com/openai/triton/issues/1395#issuecomment-1483725777

Feb 09 '24 08:02 BurhanUlTayyab

[REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton > 2.1