Yu You

Results 13 comments of Yu You

Thanks. It is SkylakeX with AVX-512. Tried input sizes from `m=n=5000` to `m=n=15000`. OpenBLAS 0.3.7 - this should have latest improvements? Can create a plot and profile DGEMM.

Good to know, thanks! I'll try 0.3.10 and report back.

Well, it turns out that I was using 0.3.10. But I have some more observations as shown in the below plots. `dgetrf` tests (left panel) were run with a `5000x5000`...

Yes I noticed that OpenBLAS `DGETRF` is much faster than the netlib implementation, but as we see here is still not as fast as MKL.

Need to update `include/cuda/std/detail/libcxx/include/version` and define `__cpp_lib_span`. Otherwise, this worked for my `mdspan` tests. Thanks!

Similar issue in constructor of `layout_{left|right}` that takes a `layout_stride::mapping`, where `size_t stride = 1;` is compared against the stride of the input mapping, which could be signed.

> Hey @youyu3 sorry for the last minute change, but based on some internal conversation, I think we can drop the `experimental` namespace for `mdspan`. Okay. Will do. Then I...

I pulled commits from the `span` PR. There are further changes in those files I believe. > Commit history appears to be broken. I'll try to resolve this locally and...

Great pseudocode! It would be great to have something that works for arbitrarily nested conjugated/transposed/scaled (if we don't limit the recursion depth). One quick question: don't we need to check...

@fnrizzi I believe that the following functions are missing the in-place interface as well: triangular_matrix_vector_solve triangular_matrix_matrix_left_solve triangular_matrix_matrix_right_solve