Make it easier to tell when libmathdx fails to compile due to insufficient shared memory
Description
Over at https://github.com/google-deepmind/mujoco_warp/issues/47 and #602, we see a situation where warp.build.build_lto_solver() fails due to insufficient shared memory capacity.
Currently, users need to jump through some hoops to see what's going wrong. They see libmathdx cuSOLVER error: 3 on /home/eshi/code-projects/warp/warp/native/warp.cu:3281 and Failed to compile tile_cholesky.
If they also set LIBMATHDX_LOG_LEVEL=5 (undocumented) and run, they get more info:
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverCreate] handle=0x7ffcddd244a0
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64Array] handle=0 op=CUSOLVER_OPERATOR_SIZE count=2 array=570,570
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64Array] handle=0 op=CUSOLVER_OPERATOR_BLOCK_DIM count=3 array=256,1,1
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_TYPE value=0
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_API value=0
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_FUNCTION value=2
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_EXECUTION value=1
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_PRECISION value=5
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_FILL_MODE value=0
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOperatorInt64] handle=0 op=CUSOLVER_OPERATOR_SM value=860
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverSetOptionStr] handle=0 opt=0 value=potrf_570_570_86_5
[2025-03-31 10:11:17][libmathdx][23816][Api][cusolverGetLTOIRSize] handle=0 lto_size=0x7ffcddd244a8
[2025-03-31 10:11:17][libmathdx][23816][Info][cusolverGetLTOIRSize] Decompression done in 136.686 ms.
[2025-03-31 10:11:17][libmathdx][23816][Info][cusolverGetLTOIRSize] NVRTC 12.8 compiling for SM 8.6
[2025-03-31 10:11:17][libmathdx][23816][Info][cusolverGetLTOIRSize] NVRTC options --relocatable-device-code=true,--device-as-default-execution-space,--std=c++17,-DCUFFTDX_DETAIL_USE_CUDA_STL=1,-dlto,--gpu-architecture=compute_86
[2025-03-31 10:11:17][libmathdx][23816][Info][cusolverGetLTOIRSize] NVRTC compilation finished in 415.333 ms.
[2025-03-31 10:11:17][libmathdx][23816][Error][cusolverGetLTOIRSize] NVRTC compilation failed, error message:
cusolverdx/detail/solver_description.hpp(183): error: static assertion failed with "Provided combination of data type and sizes makes this problem not fit into shared memory available on the specified architecture"
static_assert((not has_sm) or valid_shared_size,
^
detected during instantiation of class "cusolverdx::detail::solver_description<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>>]" at line 3 of cb8750fb693d2344.cu
cusolverdx/detail/solver_description.hpp(183): error: static assertion failed with "Provided combination of data type and sizes makes this problem not fit into shared memory available on the specified architecture"
static_assert((not has_sm) or valid_shared_size,
^
detected during instantiation of class "cusolverdx::detail::solver_description<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>, cusolverdx::FillMode<cusolverdx::fill_mode::upper>>]" at line 3 of cb8750fb693d2344.cu
cusolverdx/detail/solver_description.hpp(183): error: static assertion failed with "Provided combination of data type and sizes makes this problem not fit into shared memory available on the specified architecture"
static_assert((not has_sm) or valid_shared_size,
^
detected during instantiation of class "cusolverdx::detail::solver_description<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>, cusolverdx::FillMode<cusolverdx::fill_mode::upper>, cusolverdx::BlockDim<256U, 1U, 1U>>]" at line 3 of cb8750fb693d2344.cu
cusolverdx/detail/solver_description.hpp(183): error: static assertion failed with "Provided combination of data type and sizes makes this problem not fit into shared memory available on the specified architecture"
static_assert((not has_sm) or valid_shared_size,
^
detected during:
instantiation of class "cusolverdx::detail::solver_description<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>, cusolverdx::FillMode<cusolverdx::fill_mode::upper>, cusolverdx::BlockDim<256U, 1U, 1U>, cusolverdx::Block>]" at line 29 of cusolverdx/detail/solver_execution.hpp
instantiation of class "cusolverdx::detail::solver_execution<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>, cusolverdx::FillMode<cusolverdx::fill_mode::upper>, cusolverdx::BlockDim<256U, 1U, 1U>, cusolverdx::Block>]" at line 67 of cusolverdx/detail/solver_execution.hpp
instantiation of class "cusolverdx::detail::block_execution<Operators...> [with Operators=<cusolverdx::Size<570U, 570U, 1U>, cusolverdx::Precision<float, float, float>, cusolverdx::Type<(cusolverdx::type)0>, cusolverdx::Function<cusolverdx::function::potrf>, cusolverdx::SM<860U>, cusolverdx::FillMode<cusolverdx::fill_mode::upper>, cusolverdx::BlockDim<256U, 1U, 1U>, cusolverdx::Block>]" at line 3 of cb8750fb693d2344.cu
4 errors detected in the compilation of "cb8750fb693d2344.cu".
[2025-03-31 10:11:17][libmathdx][23816][Error][cusolverGetLTOIRSize] Generated sources passed to NVRTC:
#include <cusolverdx.hpp>
using namespace cusolverdx;
using Solver = decltype(Size<570,570 >() + Precision<float >() + Type<type::real >() + Function<potrf >() + SM<860 >() + FillMode<fill_mode::upper >() + BlockDim<256,1,1 >() + Block());
extern "C" { __device__ void potrf_570_570_86_5(Solver::a_data_type* smem, unsigned int ld) {
Solver::status_type info;
Solver().execute(smem, ld, &info);
};
}
[2025-03-31 10:11:17][libmathdx][23816][Error][cusolverGetLTOIRSize] NVRTC error NVRTC_ERROR_COMPILATION at /home/jenkins/agent/workspace/mathdx/helpers/libmathdx/release-0.1.2/L1_Nightly/build/libmathdx/src/nvrtc_helper.cpp:230
[2025-03-31 10:11:17][libmathdx][23816][Error][cusolverGetLTOIRSize] ERROR at /home/jenkins/agent/workspace/mathdx/helpers/libmathdx/release-0.1.2/L1_Nightly/build/libmathdx/src/libcommondx.cpp:49
[2025-03-31 10:11:17][libmathdx][23816][Error][cusolverGetLTOIRSize] ERROR at /home/jenkins/agent/workspace/mathdx/helpers/libmathdx/release-0.1.2/L1_Nightly/build/libmathdx/src/libcommondx.cpp:74
We should do something to make it clear to the user what's going wrong.
One possibility is to do some basic sanity checks with the partially exposed g_devices[i].max_smem_bytes before calling into warp.context.runtime.core.cuda_compile_solver().
Context
mujoco_warp, tile API