侯奇

Results 8 issues of 侯奇

_gexf.py in line 480 should be modified from ``` self.spells = spells ``` to code like in line 633 ``` self.spells = Spells(spells) ```

will tiny-membench support multi-thread memory bandwidth and if yes, when?

fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`

inactive-30d
inactive-90d

I was reading the https://github.com/triton-lang/triton/blob/9a0a7c2ccc6e6fd5f98c06476a0ca591b65758cf/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td and found something confusing: ```markdown 2. Multiple rows per phase #shared [ 0, 1, 2, 3], // phase 0 (xor with 0) [ 4, 5,...

## The problem FP8 GEMM result not right: ![image](https://github.com/user-attachments/assets/5a0f6973-1f4d-4287-b602-f33fa5cc799d) ## How to reproduce * the code: see below * triton version: 3.0.0 * CUDA version: 12.4 * machine: L20 ```python...

### Suggestion Description can rocshmem add some perftest so that we can check the performance, just like NVSHMEM ### Operating System _No response_ ### GPU _No response_ ### ROCm Component...

Feature Request

### Description of errors it seems that there are much more enviroment variables than documented: this is the document from README.md this is from the getenv ### Attach any links,...

documentation
Under Investigation