Lei Zhang comments

Repositories
Issues
Comments

Results 4 comments of


                                            Lei Zhang

HDMI not working

> Hi all, I test the current configuration in this repo for MacOS 12.4 in the past days, and find some bugs. > > 1. sometimes, booting system may fail,...

latency result slower than tensorrt fp16

Thanks for your reply, my development environment is TVM: 0.14.dev0 tensorrt: 8.6.1

Hi, I use triton==2.1.0. and N_CTX_2 means the sequence_length of K/V. N_CTX means the sequence_length of Q. I check the tutorials: `https://triton-lang.org/main/getting-started/tutorials/06-fused-attention.html` and https://github.com/openai/triton/blob/main/python/triton/ops/flash_attention.py#L42 . Both codes use transpose. And...

BUG in flash attention kernel

@yiakwy-xpu-ml-framework-team You mentioned that N_CTX_2 (batches x hiddens x seqence_length) is introduced in the PR to support Hopper TMA, but the version of triton I'm using(or reference) doesn't reference the...