MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

CMakeLists.txt: Properly handle cpu flags

Open Nowa-Ammerlaan opened this issue 3 years ago • 4 comments

Having multiple cpu flags enabled is valid, so we should use if instead of elsif. Otherwise only the first HAVE_* variable is actually respected.

e.g. before this change we get this on Gentoo:

/usr/bin/x86_64-pc-linux-gnu-g++ -march=native -O2 -pipe -frecord-gcc-switches -Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0   -rdynamic   -fsigned-char  -mavx2 -mcx16 -Wa,-q -mcx16 -std=c++1y -pedantic -Wall -Wextra -Wdisabled-optimization -fno-exceptions -fopenmp src/CMakeFiles/mmseqs.dir/mmseqs.cpp.o -o src/mmseqs  src/libmmseqs-framework.a  src/version/libversion.a  lib/tinyexpr/libtinyexpr.a  -lm  -Wl,-Bstatic  -lzstd  lib/microtar/libmicrotar.a  -Wl,-Bdynamic  -lz  -lbz2 && :

i.e. only avx2 is respected, sse2/4 is ignored.

After change we get:

/usr/bin/x86_64-pc-linux-gnu-g++ -march=native -O2 -pipe -frecord-gcc-switches -Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0   -rdynamic   -fsigned-char  -mavx2 -mcx16 -Wa,-q -msse4.1 -mcx16 -msse2 -std=c++1y -pedantic -Wall -Wextra -Wdisabled-optimization -fno-exceptions -fopenmp src/CMakeFiles/mmseqs.dir/mmseqs.cpp.o -o src/mmseqs  src/libmmseqs-framework.a  src/version/libversion.a  lib/tinyexpr/libtinyexpr.a  -lm  -Wl,-Bstatic  -lzstd  lib/microtar/libmicrotar.a  -Wl,-Bdynamic  -lz  -lbz2 && :

https://github.com/gentoo/sci/pull/1143

Signed-off-by: Andrew Ammerlaan [email protected]

Nowa-Ammerlaan avatar Feb 13 '22 15:02 Nowa-Ammerlaan

I thought that -mavx2 would imply (most) lower SSE levels. We also use an SSSE3 instruction in some important place (iirc), so should we also enable that explicitly? (Edit: we use _mm_shuffle_epi8 a lot.)

I don't really like it a lot that with this change -DHAVE_AVX2 -DHAVE_ARM8 would be a possible but nonsensical combination.

BTW Debian uses -DNATIVE_ARCH=0 (https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/rules) and compiles multiple SIMD levels and adds a SIMD dispatcher (https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/bin/simd-dispatch), if that is in any way useful for you.

milot-mirdita avatar Feb 13 '22 17:02 milot-mirdita

We also use an SSSE3 instruction in some important place (iirc), so should we also enable that explicitly?

Yes we should probably have a HAVE_SSE3 option for that as well.

I don't really like it a lot that with this change -DHAVE_AVX2 -DHAVE_ARM8 would be a possible but nonsensical combination.

Well yes, but if the user sets nonsensical flags it is not that big of a problem if the outcome is (expected) nonsense. Is there an easy way we can avoid this?

BTW Debian uses -DNATIVE_ARCH=0 (https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/rules) and compiles multiple SIMD levels and adds a SIMD dispatcher (https://salsa.debian.org/med-team/mmseqs2/-/blob/master/debian/bin/simd-dispatch), if that is in any way useful for you.

In Gentoo this is controlled by CPU_FLAGS_X86 variable at build time. In general we don't use dispatchers for this because there isn't really a point if things are built exactly according to the users set preference. The problem here is that the user preference is not respected because the flags are treated as mutually exclusive.

Nowa-Ammerlaan avatar Feb 13 '22 17:02 Nowa-Ammerlaan

I don't really like it a lot that with this change -DHAVE_AVX2 -DHAVE_ARM8 would be a possible but nonsensical combination.

How about now? This should fix the problem but still keep the nonsensical combinations mutually exclusive.

Nowa-Ammerlaan avatar Feb 15 '22 07:02 Nowa-Ammerlaan

At least on g++ -mavx2 implies all the other flags. See https://stackoverflow.com/questions/77828641/does-mavx2-implies-mavx-and-msse4-2 Because this is true for the actual CPUs, I don't think other compilers would handle this differently.

benjamin-lieser avatar Mar 11 '24 12:03 benjamin-lieser