Sudo high cpu usage after terminal is closed
Describe the bug
Frequently the sudo process is left running at 100% on one cpu core, after the terminal is already closed. When i hear the cpu fans i know there is atleast one sudo that i need to kill manually.
To Reproduce Steps to reproduce the behavior:
- in terminal for example Alacritty
sudo su - with nix temporarily download an app
nix shell nixpkgs#sysbench - kill the terminal (i close it with a Hyprland hotkey)
-
sudoprocess now starts to run at 100% on one core
Expected behavior
sudo process doesnt torture my cpu.
Environment (please complete the following information):
- Linux distribution: NixOS 24.05.20240329.d8fe5e6 (Uakari) x86_64
- sudo-rs 0.2.2
Additional context I use Hyprland window manager, maybe it has something to do with how it closes apps? Or maybe nix shell causes it.
@blazp7 does this happen with OG sudo as well?
Edit: I can't reproduce this while downloading a large file and killing alacritty on gnome.
I remember we fixed this related issue during the Berlin Meeting: https://github.com/memorysafety/sudo-rs/issues/263. We could also have a look at the relevant code where the potential loop might be.
Of course, @pvdrz, if we fix this we are getting really close to your favorite scenario
Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?
@squell the only place where something similar could happen would be here: https://github.com/memorysafety/sudo-rs/blob/9e7a4c4fa48ec6e411e5cb3b2ae0a4258227a856/src/exec/event.rs#L222-L243
But that would mean that there aren't any file descriptors being polled or that the polling is failing consistently, which I find very unlikely as all the signal handling is done via sockets that are polled there.
It seems that just exiting a terminal where sudo su is active causes this issue. I have tried multiple terminal emulators, same issue. I have tried to exit the terminal app with killall instead of hotkey, same issue. So it probably isnt Hyprland or Alacritty at fault. If you guys cant reproduce this problem i would say it has something to do with how NixOS packages the binary and wraps it in a module. Or just how NixOS is.
I don't seem to be able to reproduce it on my system:
Note: the example you cite is
sudo su; is that oursuimplementation or the "regular" version? Does it also occur with say something likesudo sleep 600orsudo yes > /dev/null?
Only sudo-rs, not the regular version. And only sudo su then killing the terminal, i tried the other commands you posted. Edit: nixos uses this su implementation from here
Might or might not be relevant, dont know if this is the default behaviour but someone from nixos matrix chat said:
- Don't do sudo su. It spawns a sudo process, which then spawns a su process as root, which then spawns a shell.
-
(Yes, sudo -s or sudo -i is generally what you should do instead of sudo su, but that shouldn't cause this issue.)
Does it happen with sudo sudo -s as well?
Does it happen with
sudo sudo -sas well?
No. After i close the terminal that has an active sudo sudo -s or sudo bash every sudo process is properly closed as expected. This points to su being the culprit? However when i do su myuser there are no issues.
I'm suspecting an interaction between ~util-linux~ nixos's su and sudo-rs based on your answers; even though su seems to trigger it, I'd say the problem lies with sudo-rs (since the child process shouldn't be able to influence it in this way).
Alright, well i learned that i should use sudo -s and sudo -i, but if youre going to troubleshoot this interaction just remember that nixos uses this su implementation -> https://github.com/shadow-maint/shadow/blob/master/src/su.c
That's very useful info (I didn't know shadow-utils also packaged a su implementation). @pvdrz
I have commands showing the same symptoms. They look like:
sudo -- su root -c ip netns exec SOME_NS su someuser -c "'mate-terminal' '--disable-factory'"
Self-built sudo-rs 0.2.2 on Arch.
su from util-linux.
strace shows these two lines repeating:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1 ([{fd=4, revents=POLLIN|POLLERR|POLLHUP}])
read(4, "", 8192) = 0
Never seen this behavior before installing sudo-rs. But It's worth noting that I updated many system packages at the same time. So other variables could be relevant, but I highly doubt it.
Is that the same issue?
I know Rust if you think fiddling with/debuging a specific part of the code would help.
@AppleSheeple thanks for the report!
It seems one of the sockets we're using is being reported as ready constantly (file descriptor 4 in your strace output) when being polled.
I have a hunch this has to do with a socket on the other "side" being closed (we always use socketpair here) and polling socket number 4 returns immediately because it is actually returning an error.
We may be handling this error improperly. I'll try to take a look at this next week.
I had some time for a proper initial investigation (didn't get to the code yet).
What's happening is that I run this stuff from a script in a terminal. But I background and detach this run.
So:
./script_with_a_sudo_call_inside.sh &!
fd 4 is not actually a socket. It's actually /dev/tty.
Now, if I don't close the terminal from where this script is running, the issue is not triggered.
But once I do (/dev/pts/<n> associated with stdin/stdout/stderr is deleted), the issue is triggered.
So, in dev logs, you get endless lines of:
src/exec/use_pty/parent.rs:353:24: event Tty(Readable) is ready
I gave the code a look, and confirmed that polling here does NOT return an error: https://github.com/memorysafety/sudo-rs/blob/90e2385b866dde529837da9e059ec0f1e9cd83c6/src/exec/event.rs#L182
So, even if that error wasn't ignored, the issue would still be triggered.
I don't think I will have the time to chase this further.
Alright. Had some time to look at this again.
This change fixes the issue for me:
diff --git a/src/exec/use_pty/pipe/mod.rs b/src/exec/use_pty/pipe/mod.rs
index 7f02f06..b6046c7 100644
--- a/src/exec/use_pty/pipe/mod.rs
+++ b/src/exec/use_pty/pipe/mod.rs
@@ -7,6 +7,7 @@ use std::{
};
use crate::exec::event::{EventHandle, EventRegistry, PollEvent, Process};
+use crate::log::dev_warn;
use self::ring_buffer::RingBuffer;
@@ -58,6 +59,19 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
&self.right
}
+ pub(super) fn tty_gone<T: Process>(&mut self, registry: &mut EventRegistry<T>) -> bool {
+ let gone = unsafe {
+ // Check if tty which existed is now gone.
+ // NOTE: errno set is EIO which is not a documented possibility.
+ libc::tcgetsid(self.left.as_raw_fd()) == -1
+ };
+ if gone {
+ dev_warn!("tty gone (closed/detached), ignoring future events");
+ self.ignore_events(registry);
+ }
+ gone
+ }
+
/// Stop the poll events of this pipe.
pub(super) fn ignore_events<T: Process>(&mut self, registry: &mut EventRegistry<T>) {
self.buffer_lr.read_handle.ignore(registry);
@@ -80,6 +94,9 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
poll_event: PollEvent,
registry: &mut EventRegistry<T>,
) -> io::Result<()> {
+ if self.tty_gone(registry) {
+ return Ok(());
+ }
match poll_event {
PollEvent::Readable => self.buffer_lr.read(&mut self.left, registry),
PollEvent::Writable => self.buffer_rl.write(&mut self.left, registry),
Just ran into this on another device where I forgot to install a patched version. Quite annoying.