segment fault under C/C++ API binding
I am using C API (libfdb_c.so) in my C++ application:
- I created an Singleton method for the
FDBCppBinding.
static std::shared_ptr<FDBCppBinding> GetInstance(FDBConfig config) {
static std::shared_ptr<FDBCppBinding> instance(new FDBCppBinding(std::move(config)));
TLOG(INFO, "FDBCppBinding instance returned");
return instance;
}
- Here's how I init the
FDBCppBinding:
fdb_error_t error = fdb_select_api_version(FDB_API_VERSION);
if (error != 0) {
TLOG(ERROR, "Failed to select FDB API version {}: {}", FDB_API_VERSION, error);
return FDBError::kUnknownError;
}
TLOG(INFO, "Selected FDB API version: {}", FDB_API_VERSION);
setNetworkOptions();
error = fdb_setup_network();
if (error != 0) {
TLOG(ERROR, "FDB Network setup failed: {}", error);
return FDBError::kNetworkError;
}
LOG_INFO("FDB Network setup completed");
startNetworkThread();
- Inside the
startNetworkThread
network_future_ = std::async(std::launch::async, [this]() {
network_running_ = true;
TLOG(INFO, "Starting global FDB network thread");
auto error = fdb_run_network();
network_running_.store(false);
TLOG(INFO, "FDB network thread exited with error: {}", error);
});
The problem is, when I exit the UT of the application, there's an segment error:
Missing separate debuginfos, use: dnf debuginfo-install foundationdb-clients-7.1.38-1.x86_64 glibc-2.38-29.tl4.x86_64 libgcc-12.3.1.5-2.tl4.x86_64 libstdc++-12.3.1.5-2.tl4.x86_64 numactl-devel-2.0.16-5.tl4.x86_64
(gdb) bt
#0 0x00000000047bb6c0 in ?? ()
#1 0x00007ffff69d576d in Transaction::createTrLogInfoProbabilistically(Database const&) () from /lib64/libfdb_c.so
#2 0x00007ffff69fbe86 in Transaction::Transaction(Database const&, Optional<Standalone<StringRef> > const&) ()
from /lib64/libfdb_c.so
#3 0x00007ffff6d8772c in ReadYourWritesTransaction::ReadYourWritesTransaction(Database const&, Optional<Standalone<StringRef> >) () from /lib64/libfdb_c.so
#4 0x00007ffff6fcdd78 in internal_thread_helper::DoOnMainThreadVoidActorState<ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext*, ISingleThreadTransaction::Type, Optional<Standalone<StringRef> >)::{lambda()#1}, internal_thread_helper::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext*, ISingleThreadTransaction::Type, Optional<Standalone<StringRef> >)::{lambda()#1}> >::a_body1cont1(Void const&, int) [clone .constprop.0] [clone .isra.0] () from /lib64/libfdb_c.so
#5 0x00007ffff6fcde68 in ActorCallback<internal_thread_helper::DoOnMainThreadVoidActor<ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext*, ISingleThreadTransaction::Type, Optional<Standalone<StringRef> >)::{lambda()#1}>, 0, Void>::fire(Void const&) () from /lib64/libfdb_c.so
#6 0x00007ffff715c3f8 in N2::Net2::run() () from /lib64/libfdb_c.so
#7 0x00007ffff69a12c2 in runNetwork() () from /lib64/libfdb_c.so
#8 0x00007ffff6fce1bf in ThreadSafeApi::runNetwork() () from /lib64/libfdb_c.so
#9 0x00007ffff6948f91 in MultiVersionApi::runNetwork() () from /lib64/libfdb_c.so
#10 0x00007ffff691e9fa in fdb_run_network () from /lib64/libfdb_c.so
#11 0x0000000001fccef0 in operator() (__closure=0x47a4a68)
at /data00/kuankuan/tcqa-table/src/ms/fdb/fdb_cpp_binding.cc:569
#12 std::__invoke_impl<void, tcqa::table::fdb::FDBCppBinding::startNetworkThread()::<lambda()> > (__f=...)
at /usr/include/c++/12/bits/invoke.h:61
#13 std::__invoke<tcqa::table::fdb::FDBCppBinding::startNetworkThread()::<lambda()> > (__fn=...)
FDB version: 7.1.38 (And I've tried 7.3.59)
hi @royguo
I did some research on this issue, my understanding is basically
The Root Cause
Looking at the crash location in NativeAPI.actor.cpp line 6784-6785:
Reference<TransactionLogInfo> Transaction::createTrLogInfoProbabilistically(const Database& cx) {
if (!cx->isError()) {
double sampleRate = cx->globalConfig->get<double>(...); // ← CRASH HERE
The crash at address 0x00000000047bb6c0 indicates that cx (the Database object) or cx->globalConfig has already been destroyed, but the network thread is still running and trying to access it.
The Critical Issues in Your Code
-
Missing
fdb_stop_network()call - You never signal the network thread to stop - Database handles outlive the network thread - Your Database objects are destroyed while the network thread is still running
- Static destruction order problem - Your singleton's static instance is destroyed during program exit while the network thread is active
The Correct Cleanup Order
From analyzing FDB test code (unit_tests.cpp, ryw_benchmark.c, etc.), the proper sequence is:
// 1. Destroy all transactions and database handles FIRST
fdb_database_destroy(db);
// 2. Then stop the network
fdb_stop_network();
// 3. Wait for network thread to complete
network_thread.join();
The Complete Fix for Your Code
Here's what you need to implement in your FDBCppBinding class:
1. Add proper member variables to track resources
class FDBCppBinding {
private:
FDBDatabase* db_ = nullptr; // Track database handle
std::future<void> network_future_;
std::atomic<bool> network_running_{false};
std::atomic<bool> network_stopped_{false};
// ... other members
};
2. Implement proper cleanup in the destructor
~FDBCppBinding() {
TLOG(INFO, "FDBCppBinding destructor called");
cleanup();
}
void cleanup() {
if (network_stopped_.load()) {
return; // Already cleaned up
}
// Step 1: Destroy all database handles FIRST
if (db_ != nullptr) {
TLOG(INFO, "Destroying FDB database handle");
fdb_database_destroy(db_);
db_ = nullptr;
}
// Step 2: Stop the network thread
if (network_running_.load()) {
TLOG(INFO, "Stopping FDB network");
fdb_error_t error = fdb_stop_network();
if (error != 0) {
TLOG(ERROR, "Failed to stop FDB network: {}", error);
}
}
// Step 3: Wait for the network thread to complete
if (network_future_.valid()) {
try {
TLOG(INFO, "Waiting for FDB network thread to join");
network_future_.wait();
TLOG(INFO, "FDB network thread stopped successfully");
} catch (const std::exception& e) {
TLOG(ERROR, "Exception while waiting for network thread: {}", e.what());
}
}
network_running_.store(false);
network_stopped_.store(true);
}
3. Fix the singleton pattern
**Option A: Use explicit cleanup ** //unit tests can be implemented in this approach...
class FDBCppBinding {
private:
static std::shared_ptr<FDBCppBinding> instance_;
static std::mutex instance_mutex_;
public:
static std::shared_ptr<FDBCppBinding> GetInstance(FDBConfig config) {
std::lock_guard<std::mutex> lock(instance_mutex_);
if (!instance_) {
instance_.reset(new FDBCppBinding(std::move(config)));
TLOG(INFO, "FDBCppBinding instance created");
}
return instance_;
}
// Call this explicitly in your unit test teardown
static void DestroyInstance() {
std::lock_guard<std::mutex> lock(instance_mutex_);
if (instance_) {
TLOG(INFO, "Explicitly destroying FDBCppBinding instance");
instance_->cleanup();
instance_.reset();
}
}
};
// In .cpp file
std::shared_ptr<FDBCppBinding> FDBCppBinding::instance_ = nullptr;
std::mutex FDBCppBinding::instance_mutex_;
4. In your unit test teardown exit
// At the end of your unit tests or before program exit
FDBCppBinding::DestroyInstance();
I have also double-checked with these metrics:
- Database handles destroyed first - No more access to destroyed objects
-
Network thread properly signaled -
fdb_stop_network()tells it to exit -
Synchronized shutdown -
.wait()ensures network thread completes before proceeding - Explicit cleanup - No reliance on static destruction order
According to FDB Documentation
From api-c.rst line 203:
we must wait for: func:
fdb_run_network()to return before allowing your program to exit, or else the behavior is undefined."
if you are comfortable I can fix this in my local and create a PR you can view them