侯奇
侯奇
_gexf.py in line 480 should be modified from ``` self.spells = spells ``` to code like in line 633 ``` self.spells = Spells(spells) ```
sacn -> scan
will tiny-membench support multi-thread memory bandwidth and if yes, when?
fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`
I was reading the https://github.com/triton-lang/triton/blob/9a0a7c2ccc6e6fd5f98c06476a0ca591b65758cf/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td and found something confusing: ```markdown 2. Multiple rows per phase #shared [ 0, 1, 2, 3], // phase 0 (xor with 0) [ 4, 5,...
## The problem FP8 GEMM result not right:  ## How to reproduce * the code: see below * triton version: 3.0.0 * CUDA version: 12.4 * machine: L20 ```python...
### Suggestion Description can rocshmem add some perftest so that we can check the performance, just like NVSHMEM ### Operating System _No response_ ### GPU _No response_ ### ROCm Component...
### Description of errors it seems that there are much more enviroment variables than documented: this is the document from README.md this is from the getenv ### Attach any links,...