Maosheng Liao
Maosheng Liao
为使您的问题得到快速解决,在建立 Issue 前,请您先通过如下方式搜索是否有相似问题: [历史 issue](https://github.com/PaddlePaddle/Paddle-Lite/issues), [FAQ 文档](https://paddle-lite.readthedocs.io/zh/develop/quick_start/faq.html), [官方文档](https://paddle-lite.readthedocs.io/zh/develop/guide/introduction.html) 建立 issue 时,为快速解决问题,请您根据使用情况给出如下信息: - 标题:简洁、精准描述您的问题,例如“ssd 模型转换报错” - 版本、环境信息: 1)Paddle Lite 版本:v2.11 2)Host 环境:MacOS Montery - 模型信息 1)模型名称 [3x3s2_dw.onnx.zip](https://github.com/PaddlePaddle/Paddle-Lite/files/9598723/3x3s2_dw.onnx.zip) 复现: ``` from x2paddle.convert...
Sorry to post the question here. According to the paper, after `all_to_all`, every device will hold `1/P` part of heads, and the it will be sent to perform local attention...
https://github.com/Dao-AILab/flash-attention/blob/3669b25206d5938e3cc74a5f7860e31c38af8204/csrc/flash_attn/flash_api.cpp#L314-L319 For example, `out_accum ` could be recycled out of this code block/ function, We still use this pointer `oaccum_ptr`, is this valid?
### Suggestion Description I am mad about finding tutorials/docs for programming in AMD device. I COULDN'T find any docs just like: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html This really makes user mad when they want...
### Problem Description Such as when I hipify a cu file having fp8 datatype in it, after the `hipify-clang` command, the fp8 datatype doesn't turned into HIP fp8 type. For...
I encountered a problem when using int8 gemm cutlass kernel: https://github.com/NVIDIA/TensorRT-LLM/issues/2351 For shape [16,6144,4096], I got perf of `14us` in my unittest benchmark, but in real models, I got `25us`....
## Motivation Enable flashinfer backend for deepseekv2. ## Modifications Only one line. ## Checklist - [x] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit). - [x] Add unit...