Baole Ai issues

Results 7 issues of


                                            Baole Ai

[Question] Do you compare FastSeq with RingAttention or Context Parallel in TransformerEngine?

This is an impressive project! It demonstrates superior performance compared to Ulysses. Have you also conducted comparisons of FastSeq with RingAttention or Context Parallel in the TransformerEngine?

question

[Dynamic Shape] Input with dynamic shape.

## 🚀 Feature Supports input with dynamic shape. ## Motivation During the training of Large Language Models (LLMs), the sequence lengths of input data are typically variable, necessitating padding prior...

dynamism

No "Memory" view in tensorboard when set `profile_memory=True`

Hi, there is no "Memory" view in tensorboard after set `profile_memory=True`. The test env is : PyTorch1.8.1 + torch_tb_profiler0.4.0.

bug

plugin

Does benchmark_pipe support ibv transport and cuda channel?

It seems like benchmark_pipe only support [shm|uv] transport and [basic] channel. Does benchmark_pipe also support ibv transport and cuda channel? Is there a complete example of different transport and channel...

undefined reference to `pthread_create'

Hi, when I build as follows ``` $ cd tensorpipe $ mkdir build $ cd build $ cmake ../ -GNinja -DTP_BUILD_TESTING=ON $ ninja ``` I got the following errors: FAILED:...

[FP8][FA3] Is there a plan to support _flash_attn_varlen_forward with fp8

Hi Team, `_flash_attn_forward` now supports fp8 but `_flash_attn_varlen_forward` does not support fp8 yet https://github.com/Dao-AILab/flash-attention/blob/main/hopper/flash_api.cpp#L440. I would like to ask if there are any plans to implement support for _flash_attn_varlen_forward using...

Query on dimension-dependent scaling of norm_vec in H100 kernel.

Hi, @Aaryan0404, while reviewing the implementation of the attention kernel in h100.cu, I noticed the following scaling of norm_vec based on the dimension D: ``` if constexpr (D == 64)...