devicemapper-rs icon indicating copy to clipboard operation
devicemapper-rs copied to clipboard

Return an error if udev is not running/available

Open bmr-cymru opened this issue 2 years ago • 2 comments

Currently if the systemd-udevd daemon is not running or is unable to respond to uevents the udev synchronization code in devicemapper will hang indefinitely:

[2023-05-25T12:04:11Z DEBUG devicemapper::core::dm] Resuming device stratis-1-private-e00449c6a59b43e48a1187473064225d-physical-originsub
[2023-05-25T12:04:11Z DEBUG devicemapper::core::dm_udev_sync::sync_semaphore] Created UdevSync { cookie: 5114325, semid: 0 }
[2023-05-25T12:04:11Z TRACE devicemapper::core::dm_udev_sync::sync_semaphore] Waiting on UdevSync { cookie: 5114325, semid: Some(0) }
<< blocks here >>

It would be better to test the state of udev and to report an error like "The udev daemon is not running".

The libdevmapper library detects this state by calling the udev API:

static int _check_udev_is_running(void)
{
        struct udev *udev;
        struct udev_queue *udev_queue;
        int r;

        if (!(udev = udev_new()))
                goto_bad;

        if (!(udev_queue = udev_queue_new(udev))) {
                udev_unref(udev);
                goto_bad;
        }

        if (!(r = udev_queue_get_udev_is_active(udev_queue)))
                log_debug_activation("Udev is not running. "
                                     "Not using udev synchronisation code.");

        udev_queue_unref(udev_queue);
        udev_unref(udev);

        return r;

bad:
        log_error("Could not get udev state. Assuming udev is not running.");
        return 0;
}

There are Rust bindings for libudev available so we can potentially implement the same check in devicemapper.

bmr-cymru avatar May 25 '23 18:05 bmr-cymru

Looking at libudev-rs it does not seem to implement the queue API that is required for the check, so that may be a dead end.

bmr-cymru avatar May 25 '23 18:05 bmr-cymru

Looking at libudev sources the check that libdevmapper is doing actually amounts to:

_public_ int udev_queue_get_udev_is_active(struct udev_queue *udev_queue) {
        return access("/run/udev/control", F_OK) >= 0;
}

And indeed when we hit the problem in emergency.target that socket is absent.

bmr-cymru avatar May 26 '23 14:05 bmr-cymru