DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] [TRITON] Upgrade Sparse Attention by Using Triton > 2.1

Open YizhouZ opened this issue 2 years ago • 6 comments

Hi, we are looking at deepspeed.ops.sparse_attention and find out that current SA is based on triton==1.0.0, which is old version. Current triton is 2.x and our supported version is 2.x. May I know if there is any plan on upgrading triton version to 2.x and maintain the sparse attention kernel?

My error stack is mainly on deepspeed.ops.sparse_attention.matmul:

import triton._C.libtriton as libtriton

segmented = libtriton.superblock(layout.data_ptr(),
                                         layout.shape[0],
                                         layout.shape[1],
                                         layout.shape[2],
                                         start_width)

Thanks!

YizhouZ avatar Dec 21 '23 07:12 YizhouZ

Is https://github.com/microsoft/DeepSpeed/pull/4071 related to this request?

delock avatar Dec 21 '23 07:12 delock

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

YizhouZ avatar Dec 21 '23 08:12 YizhouZ

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?

delock avatar Dec 21 '23 15:12 delock

got File "python3.9/site-packages/deepspeed/ops/sparse_attention/matmul.py", line 276, in make_sdd_lut segmented = libtriton.superblock(layout.data_ptr(), layout.shape[0], layout.shape[1], layout.shape[2], AttributeError: module 'triton._C.libtriton' has no attribute 'superblock'

A-Cepheus avatar Dec 22 '23 09:12 A-Cepheus

Is #4071 related to this request?

Yes, but besides changing triton version, kernel needs updates as well.

Hi @YizhouZ , what specific kernel error you met? Is it a common error that people encountered when they upgrade to triton 2.1?

Yes, it is. Infact Triton has dropped support for ops in 2.0.

Reference: https://github.com/openai/triton/issues/1395#issuecomment-1483725777

BurhanUlTayyab avatar Feb 09 '24 08:02 BurhanUlTayyab