Yunlei

Results 4 comments of Yunlei

这个是oneflow框架,nsys的结果: Using report1.sqlite export for stats reports. Exporting [/opt/nvidia/nsight-systems/2020.4.3/target-linux-x64/reports/cudaapisum.py report1.sqlite] to console... Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name ------- --------------- --------- ----------- ------- ---------- ---------------------------- 46.1...

这个是pytorch框架,nsys的结果: Using report1.sqlite export for stats reports. Exporting [/opt/nvidia/nsight-systems/2020.4.3/target-linux-x64/reports/cudaapisum.py report1.sqlite] to console... Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name ------- --------------- --------- ---------- ------- ---------- ---------------------------- 37.9...

> 从这里https://oneflow-test.oss-cn-beijing.aliyuncs.com/NeuS/nsys/report1.qdrep 可以看到 ![image](https://user-images.githubusercontent.com/688197/184575737-92c6f205-8d88-4440-a21c-9262852ebd00.png) cuda kernel之间应该有很多cpu op。也许是某处代码直接写了cpu device type。 好的我排查下哪里初始化用了cpu,感谢

在跑mixtral-8x7b-v0.1的时候遇到同样的问题,想问下预计什么时候支持呢?