sudo-rs Sudo high cpu usage after terminal is closed

Describe the bug Frequently the sudo process is left running at 100% on one cpu core, after the terminal is already closed. When i hear the cpu fans i know there is atleast one sudo that i need to kill manually.

To Reproduce Steps to reproduce the behavior:

in terminal for example Alacritty sudo su
with nix temporarily download an app nix shell nixpkgs#sysbench
kill the terminal (i close it with a Hyprland hotkey)
sudo process now starts to run at 100% on one core

Expected behavior sudo process doesnt torture my cpu.

Environment (please complete the following information):

Linux distribution: NixOS 24.05.20240329.d8fe5e6 (Uakari) x86_64
sudo-rs 0.2.2

Additional context I use Hyprland window manager, maybe it has something to do with how it closes apps? Or maybe nix shell causes it.

Apr 19 '24 22:04 blazp7

@blazp7 does this happen with OG sudo as well?

Edit: I can't reproduce this while downloading a large file and killing alacritty on gnome.

Apr 20 '24 18:04 pvdrz

I remember we fixed this related issue during the Berlin Meeting: https://github.com/memorysafety/sudo-rs/issues/263. We could also have a look at the relevant code where the potential loop might be.

Of course, @pvdrz, if we fix this we are getting really close to your favorite scenario

Apr 22 '24 08:04 squell

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

Apr 24 '24 09:04 squell

@squell the only place where something similar could happen would be here: https://github.com/memorysafety/sudo-rs/blob/9e7a4c4fa48ec6e411e5cb3b2ae0a4258227a856/src/exec/event.rs#L222-L243

But that would mean that there aren't any file descriptors being polled or that the polling is failing consistently, which I find very unlikely as all the signal handling is done via sockets that are polled there.

Apr 24 '24 13:04 pvdrz

It seems that just exiting a terminal where sudo su is active causes this issue. I have tried multiple terminal emulators, same issue. I have tried to exit the terminal app with killall instead of hotkey, same issue. So it probably isnt Hyprland or Alacritty at fault. If you guys cant reproduce this problem i would say it has something to do with how NixOS packages the binary and wraps it in a module. Or just how NixOS is.

recording.webm

Apr 24 '24 13:04 blazp7

I don't seem to be able to reproduce it on my system:

Screenshot from 2024-04-24 08-42-52

Apr 24 '24 13:04 pvdrz

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

Only sudo-rs, not the regular version. And only sudo su then killing the terminal, i tried the other commands you posted. Edit: nixos uses this su implementation from here

Might or might not be relevant, dont know if this is the default behaviour but someone from nixos matrix chat said:

Don't do sudo su. It spawns a sudo process, which then spawns a su process as root, which then spawns a shell.

Apr 24 '24 14:04 blazp7

(Yes, sudo -s or sudo -i is generally what you should do instead of sudo su, but that shouldn't cause this issue.)

Does it happen with sudo sudo -s as well?

Apr 24 '24 14:04 squell

Does it happen with sudo sudo -s as well?

No. After i close the terminal that has an active sudo sudo -s or sudo bash every sudo process is properly closed as expected. This points to su being the culprit? However when i do su myuser there are no issues.

Apr 24 '24 14:04 blazp7

I'm suspecting an interaction between ~util-linux~ nixos's su and sudo-rs based on your answers; even though su seems to trigger it, I'd say the problem lies with sudo-rs (since the child process shouldn't be able to influence it in this way).

Apr 24 '24 21:04 squell

Alright, well i learned that i should use sudo -s and sudo -i, but if youre going to troubleshoot this interaction just remember that nixos uses this su implementation -> https://github.com/shadow-maint/shadow/blob/master/src/su.c

Apr 25 '24 05:04 blazp7

That's very useful info (I didn't know shadow-utils also packaged a su implementation). @pvdrz

Apr 25 '24 09:04 squell

I have commands showing the same symptoms. They look like:

sudo -- su root -c ip netns exec SOME_NS su someuser -c "'mate-terminal' '--disable-factory'"

Self-built sudo-rs 0.2.2 on Arch. su from util-linux.

strace shows these two lines repeating:

poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1 ([{fd=4, revents=POLLIN|POLLERR|POLLHUP}])
read(4, "", 8192)                       = 0

Never seen this behavior before installing sudo-rs. But It's worth noting that I updated many system packages at the same time. So other variables could be relevant, but I highly doubt it.

Is that the same issue?

I know Rust if you think fiddling with/debuging a specific part of the code would help.

May 09 '24 10:05 AppleSheeple

@AppleSheeple thanks for the report!

It seems one of the sockets we're using is being reported as ready constantly (file descriptor 4 in your strace output) when being polled.

I have a hunch this has to do with a socket on the other "side" being closed (we always use socketpair here) and polling socket number 4 returns immediately because it is actually returning an error.

We may be handling this error improperly. I'll try to take a look at this next week.

May 09 '24 15:05 pvdrz

I had some time for a proper initial investigation (didn't get to the code yet).

What's happening is that I run this stuff from a script in a terminal. But I background and detach this run.

So:

./script_with_a_sudo_call_inside.sh &!

fd 4 is not actually a socket. It's actually /dev/tty. Now, if I don't close the terminal from where this script is running, the issue is not triggered. But once I do (/dev/pts/<n> associated with stdin/stdout/stderr is deleted), the issue is triggered. So, in dev logs, you get endless lines of:

src/exec/use_pty/parent.rs:353:24: event Tty(Readable) is ready

May 12 '24 13:05 AppleSheeple

I gave the code a look, and confirmed that polling here does NOT return an error: https://github.com/memorysafety/sudo-rs/blob/90e2385b866dde529837da9e059ec0f1e9cd83c6/src/exec/event.rs#L182

So, even if that error wasn't ignored, the issue would still be triggered.

I don't think I will have the time to chase this further.

May 12 '24 16:05 AppleSheeple

Alright. Had some time to look at this again.

This change fixes the issue for me:

diff --git a/src/exec/use_pty/pipe/mod.rs b/src/exec/use_pty/pipe/mod.rs
index 7f02f06..b6046c7 100644
--- a/src/exec/use_pty/pipe/mod.rs
+++ b/src/exec/use_pty/pipe/mod.rs
@@ -7,6 +7,7 @@ use std::{
 };
 
 use crate::exec::event::{EventHandle, EventRegistry, PollEvent, Process};
+use crate::log::dev_warn;
 
 use self::ring_buffer::RingBuffer;
 
@@ -58,6 +59,19 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         &self.right
     }
 
+    pub(super) fn tty_gone<T: Process>(&mut self, registry: &mut EventRegistry<T>) -> bool {
+        let gone = unsafe {
+            // Check if tty which existed is now gone.
+            // NOTE: errno set is EIO which is not a documented possibility.
+            libc::tcgetsid(self.left.as_raw_fd()) == -1
+        };
+        if gone {
+            dev_warn!("tty gone (closed/detached), ignoring future events");
+            self.ignore_events(registry);
+        }
+        gone
+    }
+
     /// Stop the poll events of this pipe.
     pub(super) fn ignore_events<T: Process>(&mut self, registry: &mut EventRegistry<T>) {
         self.buffer_lr.read_handle.ignore(registry);
@@ -80,6 +94,9 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         poll_event: PollEvent,
         registry: &mut EventRegistry<T>,
     ) -> io::Result<()> {
+        if self.tty_gone(registry) {
+            return Ok(());
+        }
         match poll_event {
             PollEvent::Readable => self.buffer_lr.read(&mut self.left, registry),
             PollEvent::Writable => self.buffer_rl.write(&mut self.left, registry),

May 13 '24 14:05 AppleSheeple

Just ran into this on another device where I forgot to install a patched version. Quite annoying.

Jun 05 '24 08:06 AppleSheeple