Genghan Zhang
Genghan Zhang
I created two classes: LowererImplC and LowererImplCUDA. They implement lowerForall, lowerWhere and helper functions called by them separately for CPU code and CUDA code. This is the starting point of...
I've tried to reproduce the results in Automatic Generation of Efficient Sparse Tensor Format Conversion Routines (PLDI 20'). But I met with these errors: CSR_CSC: Not supported [http://tensor-compiler.org/codegen.html?expr=A(i,j)%20=%20B(i,j)%20&format=A:ds:0,1;B:ds:1,0](http://tensor-compiler.org/codegen.html?expr=A(i,j)%20=%20B(i,j)%20&format=A:ds:0,1;B:ds:1,0) CSR/COO/CSC_DIA: "Offset"...
This is the first version of alpha-beta estimation. **What can it do** It aims to return device ids and the corresponding alpha-beta for a given logical device mesh. **How to...
**What's new?** Given a list of 1D devices, return $\alpha$ and $\beta$
**What's new?** Add a one-step transformation called ***mix-gather*** for: |Src|Dst| |---|---| |S0S1|RR| |S1S0|RR| |S01R|RR| |RS01|RR| **Why do we need this?** Reduce the communication cost. Assume $\beta_1 \gt \beta_0$, $M$ is...
Hello! FlexGen is a brilliant project, but there might be some locking issues. I ran the command `python3 bench_suite.py 6b7_1x1` but it throw a soft lockup BUG:  How can...
Can splatt support SpMTTKRP, which means the two input matrices are also sparse? Thank you!
Hi all, I followed the installation process and can run the `1-2-3-qm9.py`. However, I met the dataset problem when I ran the `1-2-3-proteins.py`. I print the `dataset.data`: > Data(edge_index=[2, 156166],...
Change the initial state of the reduction symbol for the "none" relation.