Implement an inline macro to have maximum performance on MSVC
:wave:
While debugging the still bad codegen of MSVC, I realised that on reasonably big functions, it doesn't inline any of the xsimd functions even when marked as inline. The real workaround is to mark all functions with __forceinline.
Would it be possible to implement such a macro?
I think this also applies to the (lack of) use of noexcept, too; 'modern' MSVC has better codegen with it in non-inline/debug scenarios (ref)
I'm experimenting with noexcept everywhere in #645 . If possible I'd like to avoid always_inline or __forceinline.
@serge-sans-paille if you wish to avoid it you could make it opt-in by exposing it as a user-configurable macro, e.g. XSIMD_FORCED_INLINE, and simply have it default to inline?
I'll test this branch tonight with my benchmark.
@serge-sans-paille, #645 has no effect on my benchmark; MSVC still doesn't inline xsimd's methods, resulting in a 50% perf hit compared to __forceinline. (Using /Ob3 /O2 /Gv /Oi)
@amyspark #645 updated with always inline, can you check if that fixes your issue?
@amyspark #645 updated with always inline, can you check if that fixes your issue?
That does the trick! But the friend functions in xsimd::batch, e.g.
https://github.com/xtensor-stack/xsimd/blob/54aa8e72bc7cda47907879f5ad2a9c11b4c127e7/include/xsimd/types/xsimd_batch.hpp#L163-L171
still need to be marked as XSIMD_INLINE, they're the only ones missing and MSVC doesn't judge them as inlineable (even thought the calls inside themselves are inlined).
Thanks o/ I've updated the patch accordingly