wangyuxin87
wangyuxin87
Does 'dcn_v2_psroi_pooling_forward' means deformable position-sensitive roi pooling?
测试精度问题
作者你好 我想问下这个tensorflow版本能达到原文精度吗? 略次还是略高?
can tutel be used with Megatron Deepspeed?
Thanks for your excellent work. However, GAU is slower than the original MHSA in my implementation, **3.5s vs 0.7s**. As I simply use "from flash_pytorch import GAU" with the default...
https://github.com/datamllab/LongLM
Thanks for your great work. When will the code be released?