tcmalloc causes crash during throwing of OpenFHE exception
I have a bit of a strange situation. Google internally uses clang+tcmalloc by default for all its builds, and in v1.1.4 I've encountered a few crashes that occur whenever an OpenFHE exception is thrown, with a trace like this:
2216 third_party/tcmalloc/tcmalloc.cc:909] size check failed for 0x33c1bfc3e000: claimed 8, actual 1024, class 1
2216 third_party/tcmalloc/tcmalloc.cc:854] CHECK in do_free_with_size: CorrectSize(ptr, size, align) (false)
*** SIGABRT received by PID 2216 (TID 2216) on cpu 11 from PID 2216; stack trace: ***
PC: @ 0x7f86ec862347 (unknown) gsignal
@ 0x7f86cfed4735 2544 base/process_state.cc:1237 FailureSignalHandler()
@ 0x7f877193d1c0 1281657408 (unknown)
@ 0x7f86c0b8b314 912 third_party/tcmalloc/internal/logging.cc:233 tcmalloc::tcmalloc_internal::Crash()
@ 0x7f86c0b8ae1d 48 third_party/tcmalloc/internal/logging.cc:238 tcmalloc::tcmalloc_internal::CheckFailed()
@ 0x559c9eb7f962 688 ./third_party/tcmalloc/internal/logging.h:148 tcmalloc::tcmalloc_internal::CheckFailed<>()
@ 0x559c9eac6b1c 2640 third_party/tcmalloc/tcmalloc.cc:854 TCMallocInternalDeleteArraySized
@ 0x7f877128ba93 48 third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/unique_ptr.h:73 std::__u::default_delete<>::operator()()
@ 0x7f877128b874 64 third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/unique_ptr.h:262 std::__u::unique_ptr<>::~unique_ptr()
@ 0x7f877128b6c6 128 third_party/openfhe/src/core/lib/utils/demangle.cpp:42 demangle()
@ 0x7f877128beb6 4384 third_party/openfhe/src/core/lib/utils/get-call-stack.cpp:84 get_call_stack()
@ 0x7f87743dae0d 352 third_party/openfhe/src/core/include/utils/exception.h:179 lbcrypto::OpenFHEException::OpenFHEException()
@ 0x7f87743e6a74 592 third_party/openfhe/src/core/include/math/nbtheory-impl.h:191 lbcrypto::RootOfUnity<>()
@ 0x7f8773092189 544 third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:493 lbcrypto::PackedEncoding::SetParams_2n()
@ 0x7f8773090b17 912 third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:241 lbcrypto::PackedEncoding::SetParams()
@ 0x7f87730941a0 816 third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:329 lbcrypto::PackedEncoding::Pack<>()
@ 0x7f877308eb4f 6752 third_party/openfhe/src/pke/lib/encoding/packedencoding.cpp:117 lbcrypto::PackedEncoding::Encode()
@ 0x7f877442661d 624 ./third_party/openfhe/src/pke/include/encoding/plaintextfactory.h:100 lbcrypto::PlaintextFactory::MakePlaintext<>()
@ 0x7f8774425891 992 ./third_party/openfhe/src/pke/include/cryptocontext.h:246 lbcrypto::CryptoContextImpl<>::MakePlaintext()
@ 0x7f87743cb516 176 ./third_party/openfhe/src/pke/include/cryptocontext.h:1018 lbcrypto::CryptoContextImpl<>::MakePackedPlaintext()
The error comes from here: https://github.com/google/tcmalloc/blob/7d59e25cd84cdce95f137b79466dd4c4d56e6ff2/tcmalloc/tcmalloc.cc#L765
I've found it's easy to reproduce the exception being thrown by, say, using a prime plaintext modulus that does not satisfy the correct divisibility condition m divides (q-1). See the patch below for an example:
diff --git a/src/pke/examples/simple-integers-bgvrns.cpp b/src/pke/examples/simple-integers-bgvrns.cpp
index aaeed9c..d3fd960 100644
--- a/src/pke/examples/simple-integers-bgvrns.cpp
+++ b/src/pke/examples/simple-integers-bgvrns.cpp
@@ -41,7 +41,7 @@ int main() {
// Sample Program: Step 1 - Set CryptoContext
CCParams<CryptoContextBGVRNS> parameters;
parameters.SetMultiplicativeDepth(2);
- parameters.SetPlaintextModulus(65537);
+ parameters.SetPlaintextModulus(131101); // a prime with bad divisibility
CryptoContext<DCRTPoly> cryptoContext = GenCryptoContext(parameters);
// Enable features that you wish to use
But I am not able to reproduce the actual trace in the CMake build. My attempt (v1.1.4 94fd76a1d965cfde13f2a540d78ce64146fc2700):
- Apply the patch above
- Configure with tcmalloc enabled
mkdir build && cd build cmake .. -DWITH_TCM=ON -DBUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Debug make tcm make -j 25 - Run
bin/examples/pke/simple-integers-bgvrns
As with last time, I suspect the issue is in differing compiler flags. A few stand out: -fsized-deallocation, -fno-exceptions
How would I test these compiler flags in the CMake config to see if I can reproduce this? Any idea what could be the root cause here?