xsimd Implement an inline macro to have maximum performance on MSVC

:wave:

While debugging the still bad codegen of MSVC, I realised that on reasonably big functions, it doesn't inline any of the xsimd functions even when marked as inline. The real workaround is to mark all functions with __forceinline.

Would it be possible to implement such a macro?

Nov 18 '21 20:11 amyspark

I think this also applies to the (lack of) use of noexcept, too; 'modern' MSVC has better codegen with it in non-inline/debug scenarios (ref)

Nov 19 '21 14:11 marzer

I'm experimenting with noexcept everywhere in #645 . If possible I'd like to avoid always_inline or __forceinline.

Nov 20 '21 22:11 serge-sans-paille

@serge-sans-paille if you wish to avoid it you could make it opt-in by exposing it as a user-configurable macro, e.g. XSIMD_FORCED_INLINE, and simply have it default to inline?

Nov 20 '21 22:11 marzer

I'll test this branch tonight with my benchmark.

Nov 20 '21 23:11 amyspark

@serge-sans-paille, #645 has no effect on my benchmark; MSVC still doesn't inline xsimd's methods, resulting in a 50% perf hit compared to __forceinline. (Using /Ob3 /O2 /Gv /Oi)

Nov 21 '21 00:11 amyspark

@amyspark #645 updated with always inline, can you check if that fixes your issue?

Nov 21 '21 12:11 serge-sans-paille

@amyspark #645 updated with always inline, can you check if that fixes your issue?

That does the trick! But the friend functions in xsimd::batch, e.g.

https://github.com/xtensor-stack/xsimd/blob/54aa8e72bc7cda47907879f5ad2a9c11b4c127e7/include/xsimd/types/xsimd_batch.hpp#L163-L171

still need to be marked as XSIMD_INLINE, they're the only ones missing and MSVC doesn't judge them as inlineable (even thought the calls inside themselves are inlined).

Nov 21 '21 13:11 amyspark

Thanks o/ I've updated the patch accordingly

Nov 21 '21 16:11 serge-sans-paille