Andy Lo

Results 5 issues of Andy Lo

Just want to mention that cuBLAS (via the newer `cuBLASLt` API) does offer an interface that fuses matmul with bias addition: [`cublasLtMatmul()`](https://docs.nvidia.com/cuda/cublas/#cublasltmatmul) which computes `D = A @ B +...

It is really hard to use CUTLASS due to the large (nested) template classes which has poor IDE support (e.g. autocompletion). [C++ 20 concepts](https://en.cppreference.com/w/cpp/language/constraints) is meant to be a solution...

feature request
? - Needs Triage

Some of the namings of the B-operand functions were directly copied from the A-operand counterpart, fixed the naming of the variables and comments to improve clarity.

inactive-30d

Equation 6 & 7 from the paper suggests that the scores are computed from $\hat{x}\_{t\_i}$ (**not** $\hat{x}'\_{t\_i}$). ![image](https://github.com/yang-song/score_inverse_problems/assets/66584117/af11180c-8849-4cd5-a3c9-42cd2a339396) However, in the implementation, the update (Eq. 6) is applied to `x`...

https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L248-L259 I think the `names[0] if len(names) == 1 else names[-1]` expression in L254 is just redundant. Should use just `names[-1]`.