stdBLAS Draft: Attempt to specialize matrix_vector_product for parallel

@crtrott @dalg24

@amklinv-nnl has been working on parallel specialization of stdblas algorithms. The two of us tried to specialize matrix_vector_product for std::execution::parallel_policy, but keep getting run-time recursion. We're guessing that the compiler thinks the generic ExecutionPolicy&& overload is "more specialized."

The test lives in tests/native/gemv_no_ambig.cpp. We're building using the following CMake options:

-DLINALG_ENABLE_TESTS=ON -DLINALG_ENABLE_EXAMPLES=ON -DLINALG_ENABLE_TBB=ON -DTBB_DIR=<PATH_TO_TBB_INSTALLATION>

Please don't merge this branch, btw; it will almost certainly conflict with other PRs.

Jun 20 '23 17:06 mhoemmen

FYI, TBB can be built and installed from scratch using the following repo: https://github.com/oneapi-src/oneTBB . @amklinv-nnl tested with GCC 13 and it still requires TBB for parallel algorithms to compile, alas.

Jun 20 '23 19:06 mhoemmen

@amklinv-nnl Christian Trott explained offline how specializations for different policies work.

Don't try to specialize *_is_avail.
Only write specializations for an internal execution policy. Never write specializations for any of the Standard policies.
If needed, overload execpolicy_mapper to map from std::execution::parallel_policy to a built-in policy (e.g., impl::some_happy_parallel_policy). Then, overload matrix_vector_product to take impl::some_happy_parallel_policy.

I don't have time to try this at the moment, but it sounds like this should fix the recursion issue (because users wouldn't ever pass in one of the internal execution policies).

Jun 20 '23 20:06 mhoemmen

Draft: Attempt to specialize matrix_vector_product for parallel_policy