MatX Implement argminmax

Implement a new argminmax function based on cub_reduce that calculates the following:

an index of the data's minimum
an index of the data's maximum
the data's max
the data's min

example call would be:

(matx::mtie(minVal, minIdx, maxVal, maxIdx) = matx::argminmax(inFlattened)).run();

Aug 06 '24 21:08 tylera-nvidia

@tylera-nvidia this will need a docs page too

Aug 09 '24 03:08 cliffburdick

@tylera-nvidia can we close this one given #778 ?

Oct 24 '24 15:10 cliffburdick

@tylera-nvidia I saw you just committed another patch, but is this OBE given #778?

Nov 06 '24 16:11 cliffburdick

@cliffburdick I was doing some performance comparison with @tmartin-gh to make sure things were roughly equivalent. I can into some performance issues when the return tensor is size {1} versus size {0}. runtime goes from ~40us to >1ms, but we still get the right answer. My implementation did not exhibit that behavior, but it looks like it was due to my "dumb" pointer arithmetic in writing out the output.

After resolving that, performance seems roughly comparable, so I'm going to close this PR. It looks like caching is currently broken in main, and we have some unprotected conditions for weird outputs, but those should all be fixed with new branches off main, not based on my old implementation.

Nov 06 '24 21:11 tylera-nvidia