Godlovecui

Results 12 issues of Godlovecui

### System Info L20, 8 cards, 8x48G memory, TensorRT-LLM version: 0.11.0.dev2024051400 ### Who can help? @Tra ### Information - [X] The official example scripts - [ ] My own modified...

bug
triaged

### System Info rtx4090 ### Who can help? @ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...

bug

### System Info 8*RTX4090, 24G tensorrt_llm version: 0.11.0.dev2024051400 ### Who can help? @T ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks...

bug
triaged

**Description** I run benchmark of Meta-Llama-3-8B-Instruct in RTX 8*4090, ![image](https://github.com/triton-inference-server/server/assets/68674291/1a0fd341-8d8f-4893-973c-ed1ed3b74aca) when request is 16, input sequence length is 1024, output sequence length is 1024, The TTFT(time to first token) is...

investigating

### System Info RTX 8*4090 version: TensorRT-LLM: v0.9.0 tensorrtllm_backend: v0.9.0 ### Who can help? @kaiyux @BY ### Information - [X] The official example scripts - [ ] My own modified...

bug
stale
waiting for feedback

FP8 is very useful in training or inference in LLM. Does flash attention support FP8? Thank you~

# 🚀 Feature FP8 is very useful in training or inference in LLM. Does xformers support FP8? Thank you~

When I run 06-fused-attention.py on RTX 4090, it raises below error. How to fix it? Thank you! triton version: 2.3.0 cuda: 12.4 root@GPU-RTX4090-4-8:/workspaces/triton/python/tutorials# python 06-fused-attention.py Traceback (most recent call last):...

ENV: RTX 8*4090 I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~

Hi: I'd like to test FP8 in RTX 4090. I can find some BF16 functions like SM80_16x8x8_F32BF16BF16F32_TN in cutlass/include/cute/arch/mma_sm80.hpp, however, I can't find some FP8 functions like SM80_16x8x8_F32E4M3E4M3FP32_TN. So, how...

question
? - Needs Triage
inactive-30d