update triton_ops.py from triton/python/tutorials/06-fused-attention.py

Open bmedishe opened this issue 2 years ago • 2 comments

Update deepspeed/ops/transformer/inference/triton_ops.py with latest triton/python/tutorials/06-fused-attention.py,
num_stages = 1 in deepspeed/ops/transformer/inference/triton_ops.py , num_stages=2 in triton/python/tutorials/06-fused-attention.py, because when running stable diffusion inference with deepspeed inference engine with num_stages=2 gives out of memory error (either BLOCK = 64 or num_stages=1) stable diffusion ineference with deepspeed inference works with latest triton with this update on A100. But the output image is not as good as using triton older version with which now it works

with num_stages=1 astronaut_rides_horse_num_stages_1

with BLOCK=64 astronaut_rides_horse_block_64

Jun 08 '23 16:06 bmedishe

@microsoft-github-policy-service agree company="AMD"

Jun 08 '23 16:06 bmedishe