gtensor mxp: Mixed precision extension

Mixed-precision extension to gtensor

Provides template functions mxp::adapt (and mxp::adapt_device) inspired from and extending gt::adapt. An mxp::mxp_span (derived from gt::gtensor_span) is returned, enabling mixed precision computations in gtensor kernels, i.e., computations where the compute precision differs from the data / storage precision. From the user's perspective, simply replacing gt::adapt by mxp::adapt is sufficient.

Example

const std::vector<float> x(n, 1.0 / 8. / 1024. / 1024. / 3.); // = eps(float) / 3 
/* */ std::vector<float> y(n, 1.0);

auto gt_x = gt::adapt<1>(x.data(), n); 
auto gt_y = gt::adapt<1>(y.data(), n); 

gt_y = gt_y + gt_x + gt_x; // y remains unchanged, filled with 1.0 

// Still having the data in single precision, but doing computations in double instead
auto mxp_x = mxp::adapt<1, double>(x.data(), n); 
auto mxp_y = mxp::adapt<1, double>(y.data(), n); 

mxp_y = mxp_y + mxp_x + mxp_x; // y filled with 1.000000119

Features for this PR

[x] implicit kernels
[x] explicit kernels
[x] kernels on complex data
[x] implicit kernels using .view( ...placeholders... )
[x] low precision emulation
[x] GPU

Right now only one test and minimal source code is added mainly with the purpose of opening the PR and clarify the questions below.

Questions

[x] Preferred way to call: Via mxp::adapt<...>(...) or gt::mxp::adapt<...>(...) or gt::mxp_adapt<...>(...) ?
[x] How to include: With #include <gtensor/gtensor.h> but only if flag GTENSOR_ENABLE_MXP set (like for FP16)? Or separate explicit include via #include <gtensor/mxp.h> ?
[x] Location & naming of source files (will be two more) : Just inside include/gtensor all starting with mxp_ ?

Jan 23 '25 10:01 cmpfeil

I prefer gt::mxp_adapt(...)
I prefer #include <gtensor/mxp.h>. The difference from FP16 is that it uses a different adapt function as entry point, so a cmake/C define is not needed.
I am fine with include/gtensor/mxp_*.h

@germasch what do you think?

Jan 24 '25 19:01 bd4

I prefer gt::mxp_adapt(...)

Yeah, I don't have strong feelings, but I think this is good.

I prefer #include <gtensor/mxp.h>. The difference from FP16 is that it uses a different adapt function as entry point, so a cmake/C define is not needed.

I am fine with include/gtensor/mxp_*.h

I concur with this, too.

I guess the one comment/question I have: Is it necessary to have a config option to turn mxp on / off. It seems to me that it doesn't hurt to have it always available, it shouldn't make a difference as long as one doesn't actually use it?

Jan 29 '25 13:01 germasch

I'll see if I can reproduce the SYCL CI error with latest oneAPI toolkit. The non debug build does pass.

Going forward, it is possible to run small GPU code on a computer with Gen9 integrated GPU and with the right env var for double simulation on Gen11. This covers most intel processors, including laptops.

Feb 06 '25 17:02 bd4

@bd4 did you manage to reproduce the error?

Sorry for the delay, I am able to reproduce with latest 2025 oneapi. It looks like it's related to the kernel name in SYCL, somehow an extra '"' is ending up in the name and that fails to compile. Will see if I can wrap my head around what to change to fix it.

I don't manage to build even vanilla gtensor with sycl as the
#include "level_zero/ze_api.h"
#include "level_zero/zes_api.h"
in include/gtensor/backend_sycl_compat.h are not found. Maybe I need to set some environment variables?

Once level zero and oneapi are installed, you need to do the following:

# Gen11 integrated gpu only
export OverrideDefaultFP64Settings=1
export IGC_EnableDPEmulation=1

# assuming default install path
source /opt/intel/oneapi/setvars.sh

Feb 13 '25 19:02 bd4

Good find! I saw the quote there in the error of the template name and was not immediately obvious where that came from.

Then, I guess, the value of 34 corresponding to the character " confuses the compiler on determining where the kernel name ends. Would you agree that this is a bug on the compiler side?

I agree it seems like a bug, but I have been wrong before on this front. I'll work on a minimal reproducer, should be pretty straightforward.

To check this assumption and to pass the build for now, the last "debugging" commit 1da0506 just avoids the tests for the template parameter higher than 30. Anyhow, to properly avoid it, it should be sufficient to use, e.g., std::uint16_t for the template parameter instead.

Note that the pipeline still fails at the moment, but now this is due to some actual tests not passing with thy sycl build.

It looks like a lot of those tests are 1 \neq 1 issue, where the type conversion is not happening in the same way as on other platforms.

Feb 14 '25 14:02 bd4