[FEA]: Optimize Complex FMA by exploiting lazy evaluation
Is this a duplicate?
- [X] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
libcu++
Is your feature request related to a problem? Please describe.
Given a, b, c complex numbers, a * b +c is suboptimal compared to fma(a, b, c),
see cuCfmaf implementation in cuComplex.h
Describe the solution you'd like
a * b should not directly compute the result, but returns a structure holding their values to allow lazy evaluation.
This allows (a * b) to be fused with + c and generate fma optimal code
Describe alternatives you've considered
No response
Additional context
No response
@fbusato is this a request for cuda::std::complex? We wouldn't be able to deviate from the standard on the behavior of operator*, but we could always add an extension type like cuda::complex. There have been several independent reasons that have come up for a cuda::complex type to exist, so this would be far from the first.
I am not fully following here. Are you requesting us to implement expression templates for complex?
I perfectly understand this constraint. It would be nice to add cuda::complex type if it is not too much effort.