xenium icon indicating copy to clipboard operation
xenium copied to clipboard

Double-destruction when pushing to a michael_scott_queue inside destructor with stamp_it reclaimer

Open captaincrutches opened this issue 8 months ago • 4 comments

This is kind of weird, and I'm not sure if it's a bug in xenium or something I should be doing differently, but I figure I'll bring it up anyway.

Basically, I have objects being stored as unique_ptrs in a vyukov_hash_map, and when one of them dies (by being removed from the map) I put something on a michael_scott_queue. I've run into an issue where, when the node is reclaimed and my destructor called, the act of pushing to the queue causes the reclaimer to run again, effectively recursing back into the destructor and causing Bad Things to happen.

I've managed to reproduce it with this example:

#include <xenium/policy.hpp>
#include <xenium/michael_scott_queue.hpp>
#include <xenium/vyukov_hash_map.hpp>
#include <xenium/reclamation/stamp_it.hpp>

#include <cassert>
#include <functional>
#include <iostream>
#include <thread>

xenium::michael_scott_queue<int, xenium::policy::reclaimer<xenium::reclamation::stamp_it>> queue;

struct foo
{
    int id;
    bool killed{false};

    ~foo() noexcept
    {
        std::cout << "[thread:" << std::this_thread::get_id() << "] foo destructor: " << id << "\n";
        assert(!killed);
        killed = true;
        queue.push(id);
    }
};

int main()
{
    xenium::vyukov_hash_map<int, std::unique_ptr<foo>, xenium::policy::reclaimer<xenium::reclamation::stamp_it>> map;

    int batch_size = 200;
    int batches = 8;
    std::vector<std::thread> threads;

    for (int batch = 0; batch < batches; ++batch)
    {
        threads.emplace_back([&map, batch, batch_size]()
        {
            int min = batch * batch_size;
            int max = (batch + 1) * batch_size;

            for (int i = min; i < max; ++i)
                map.get_or_emplace(i, std::make_unique<foo>(i));

            map.erase(min);
        });
    }

    for (auto&& thread : threads)
        thread.join();
}

It takes a few runs to make it happen sometimes (and is more likely for bigger values of batch_size and batches), but I do eventually get something like

[thread:139690058348224] foo destructor: 0
[thread:139690033170112] foo destructor: 600
[thread:139690024244928] foo destructor: 800
[thread:139690024244928] foo destructor: 800
xenium-destructor: xenium-destructor.cpp:21: foo::~foo(): Assertion `!killed' failed.
Aborted

i.e. the destructor called twice from the same thread.

What seems to be happening from looking at the call stack in my debugger is

  • Erasing an entry from the map orphans/retires the node, so the reclaimer can eat it
  • Pushing something else to the queue causes the reclaimer to run
  • Reclaimer identifies that node as OK to kill and begins to do so
  • We enter the destructor as part of cleaning up the node
  • In the destructor, we try to push something else to the queue
  • As part of the push, the reclaimer runs again
  • Reclaimer finds that same node again, because it's still alive, because we're in the middle of killing it
  • We end up recursing back into the destructor

Is this a bug in stamp_it (or something else)? Or is it not a bug but a quirk of stamp_it and I should just use a different reclaimer for this use case?

captaincrutches avatar May 19 '25 22:05 captaincrutches

Thanks for reporting, this is an interesting one!

The problem is the following - at some point the foo instance will be reclaimed which calls the destructor. There you are then pushing into the queue which can again cause items to be reclaimed. The problem now is that we are in the same thread that is currently already reclaiming local nodes, but ATM the state is only updated after the nodes have been reclaimed. That's why when the push also triggers reclamation, we are again reclaiming the same nodes that the currently running reclamation would also process.

The best advice I would have right now is to use different reclamation schemes for the map and the queue. I might be able to improve the reclamation so that it first updates its state before calling the deleters, but it is perfectly possible that there are other subtle issues. You could try with other reclamation schemes - I can't say right now if they could run into the same problem, because TBH this is a use case I have not yet considered. :D

mpoeter avatar May 20 '25 14:05 mpoeter

Using a different reclamation scheme on the queue does seem to help in both the reproducing example and my actual project.

Thanks for the helpful response! I know it is a bit of an unusual-sounding use case.

captaincrutches avatar May 21 '25 03:05 captaincrutches

Another tactic that's helped me here is to "manually" kill my objects outside of reclamation.

If I have an object that lives inside a xenium structure (i.e may die during reclamation), but whose death may itself modify a xenium structure in a way that causes reclamation: when that object's node is removed from its structure, I explicitly reset the shared_ptr or unique_ptr it lives in.

This way, the object itself dies right then and there, outside of the reclamation cycle, where it doesn't cause this recursion bug. Then when reclamation runs later and cleans up the node, all that gets killed is the now-empty pointer without any more side effects.

captaincrutches avatar May 29 '25 03:05 captaincrutches

If you can safely and deterministically release the resources yourself (manually) this is usually always preferable. Most reclamation schemes have to defer reclamation at some later time when it is guaranteed to be safe, which means that you temporarily use more resources than strictly necessary. FWIW, I have locally a fix that should avoid the described issue in stamp_it, I have not yet checked the other schemes though.

mpoeter avatar Jun 06 '25 07:06 mpoeter