Fuyan Yuan

Results 28 comments of Fuyan Yuan

这里改了一个本科毕业设计的模板,希望能帮到一小部分人。 https://github.com/TheRainstorm/CQUThesis/tree/for-bachelor/bachelor

> I'm happy to discuss the details and would be ready to make the needed changes to the codebase in a pull request once I receive the all clear that...

Thank you for your work! I've tested this PR, and the new CMakeLists.txt is significantly clearer compared to the original version. However, I've found that the compiled static libraries, libdecode_kernels.a...

我也遇到了一样的问题

> 已解决 你咋解决的?我看代码也没有新的 commit ?

Thank you for your quick reply. I tried synchronizing using all_reduce before each dispatch measurement, but the phenomenon described above still exists. My main code is as below (modified in...

> I highly recommend you to count the GPU time using CUDA event or CUPTI (PyTorch profiler), but not `time.perf_counter`. The current code snippet is only counting the dispatch barrier...

> If the problem still exists, I guess you should dump the profiling timeline. The problem still exists, I will profling with Nsight. Thank you for your help.

> If the problem still exists, I guess you should dump the profiling timeline. I profiled the program using nsys and found that the issue is in `notify_dispatch`. When run...

> have you resolved the problem ? This might not be an actual issue, as my previous testing methodology was flawed. As LyricZhao pointed out, the dispatch interface internally consists...