Genghan Zhang issues

Results 9 issues of


                                            Genghan Zhang

Implement separate lowerers for C and CUDA

I created two classes: LowererImplC and LowererImplCUDA. They implement lowerForall, lowerWhere and helper functions called by them separately for CPU code and CUDA code. This is the starting point of...

Failed tests on format conversion

I've tried to reproduce the results in Automatic Generation of Efficient Sparse Tensor Format Conversion Routines (PLDI 20'). But I met with these errors: CSR_CSC: Not supported [http://tensor-compiler.org/codegen.html?expr=A(i,j)%20=%20B(i,j)%20&format=A:ds:0,1;B:ds:1,0](http://tensor-compiler.org/codegen.html?expr=A(i,j)%20=%20B(i,j)%20&format=A:ds:0,1;B:ds:1,0) CSR/COO/CSC_DIA: "Offset"...

[autoparallel] Add alpha-beta estimation

This is the first version of alpha-beta estimation. **What can it do** It aims to return device ids and the corresponding alpha-beta for a given logical device mesh. **How to...

[autoparallel] Add alpha-beta estimation

**What's new?** Given a list of 1D devices, return $\alpha$ and $\beta$

Run Build and Test

[autoparallel] Draft for mix gather

**What's new?** Add a one-step transformation called ***mix-gather*** for: |Src|Dst| |---|---| |S0S1|RR| |S1S0|RR| |S01R|RR| |RS01|RR| **Why do we need this?** Reduce the communication cost. Assume $\beta_1 \gt \beta_0$, $M$ is...

Soft lockup after running flex_opt

Hello! FlexGen is a brilliant project, but there might be some locking issues. I ran the command `python3 bench_suite.py 6b7_1x1` but it throw a soft lockup BUG: ![image](https://user-images.githubusercontent.com/58754328/222419912-aebce7e6-0ef7-45be-9fe9-7a8ff2f7f5e0.png) How can...

Genghan Zhang

Implement separate lowerers for C and CUDA

Failed tests on format conversion

[autoparallel] Add alpha-beta estimation

[autoparallel] Add alpha-beta estimation

[autoparallel] Draft for mix gather

Soft lockup after running flex_opt

Support for MTTKRP with 2 sparse matrices

Num_features is 0 for PROTEINS dataset

Fix spmvGPU and ttvGPU