[BUG]: Review Cub Cache Usage and Validity of Reuse of Plan between commands.
We uncovered a sensitivity to the reuse of cub plans in the #276 branch, where if tests are compiled with NVTX flags on, some tests fail due to bad cub executions. The test will not fail if run standalone, and was verified to have no bugs through compute-sanitizer, serialized kernels, and manual debugging, so this is an operational problem based on reuse of data that may or may not be valid to reuse.
Failed in match: 0 != 3.6e+06 /home/tallen/scratch/MatX/test/00_operators/ReductionTests.cu:150: Failure Value of: MatXUtils::MatXTypeCompare( t0(), (TypeParam)(t4.Size(0) * t4.Size(1) * t4.Size(2) * t4.Size(3))) Actual: false Expected: true [ FAILED ] ReductionTestsNumericNoHalf/5.Sum, where TypeParam = double (2 ms) [----------] 1 test from ReductionTestsNumericNoHalf/5 (2 ms total)