Implement argminmax
Implement a new argminmax function based on cub_reduce that calculates the following:
- an index of the data's minimum
- an index of the data's maximum
- the data's max
- the data's min
example call would be:
(matx::mtie(minVal, minIdx, maxVal, maxIdx) = matx::argminmax(inFlattened)).run();
@tylera-nvidia this will need a docs page too
@tylera-nvidia can we close this one given #778 ?
@tylera-nvidia I saw you just committed another patch, but is this OBE given #778?
@cliffburdick I was doing some performance comparison with @tmartin-gh to make sure things were roughly equivalent. I can into some performance issues when the return tensor is size {1} versus size {0}. runtime goes from ~40us to >1ms, but we still get the right answer. My implementation did not exhibit that behavior, but it looks like it was due to my "dumb" pointer arithmetic in writing out the output.
After resolving that, performance seems roughly comparable, so I'm going to close this PR. It looks like caching is currently broken in main, and we have some unprotected conditions for weird outputs, but those should all be fixed with new branches off main, not based on my old implementation.