client icon indicating copy to clipboard operation
client copied to clipboard

keybase client causing `/` mount issues under Ubuntu 22.04, Fedora 34 and newer

Open gene1wood opened this issue 4 years ago • 60 comments

With the new version of the keybase ( 5.9.0-20211217212642.29bfd9d39f ) that @heronhaye released last week, it's causing a very worrisome issue (which I initially thought to be a hard drive problem).

I've observed it now on two different unrelated Ubuntu 18.04 systems. Initially I didn't realize it was keybase until it happened on the second system and I thought it might be mount/fuse related.

Symptoms

Intermittently the / mount in the OS stops working. Running ls / hangs and doesn't return. Launching nautilus hangs and doesn't launch. Other applications (in my case a java app called SmartGit) that try to read from / also hang. Other mounts appear ok, for example ls /boot works fine. Apps which don't do anything in the / mount (for example my browser which I guess works entirely in my home directory, and home is a distinct mount, works fine)

Possible Solution

Some clients have seen this problem go away in the v5.9.3 release which you can update to via the normal means : https://keybase.io/docs/the_app/install_linux

Others are still seeing this same problem. I migrated away from using keybase entirely (now using Syncthing) due to this problem so I can't confirm if it's been fixed or not.

Workaround

When this happened on the second Ubuntu machine and I realized it might be keybase, I ran keybase ctl stop and immediately / became available, nautilus would launch, everything was fixed.

I was able to discover the previous version number at https://prerelease.keybase.io/linux_binaries/deb/index.html

Then either

sudo apt install keybase=5.8.1-20210930160723.fefa22edc1

or fetch the deb package listed on https://prerelease.keybase.io/linux_binaries/deb/index.html and install it

wget https://s3.amazonaws.com/prerelease.keybase.io/linux_binaries/deb/keybase_5.8.1-20210930160723.fefa22edc1_amd64.deb
sudo dpkg -i keybase_5.8.1-20210930160723.fefa22edc1_amd64.deb

Then run this to tell apt to hold and not upgrade keybase to 5.9.0

sudo apt-mark hold keybase

Once this issue is fixed you can unhold keybase with

sudo apt-mark unhold keybase

MAKE SURE NOT TO FORGET THAT YOU PUT KEYBASE ON HOLD AS OTHERWISE IT WILL NEVER GET UPDATES AGAIN

gene1wood avatar Dec 28 '21 06:12 gene1wood

Looks like a duplicate of #24364.

orgcontrib avatar Jan 03 '22 07:01 orgcontrib

@orgcontrib #24364 is a report from 2020 on Keybase 5.5.2 (not a new issue that began in 5.9.0 and which can be worked around by downgrading to 5.8.1).

Also #24364 refers to an inability to use keybase mounts, not symptoms that involve hanging any process that tries to take actions on the / mount (not the keybase mount)

gene1wood avatar Jan 03 '22 20:01 gene1wood

I have the same issue on Fedora 34 with package keybase-5.9.0.20211217212642.29bfd9d39f-1.x86_64

ncharles avatar Jan 06 '22 12:01 ncharles

I've now observed this on a third Ubuntu 18.04 machine. In this case it manifested during an unattended upgrade when grub ran to rebuild the menu after a kernel upgrade it hung. I had to follow the steps above to stop keybase, downgrade it and hold it at 5.8.1

gene1wood avatar Jan 07 '22 16:01 gene1wood

I just ran into the same problem on an Ubuntu Server 20.04 LTS server after a routine upgrade. /keybase is maybe-sorta-kinda mounted though it should not be (I use a private mount under ~/.config/keybase/kbfs/ instead). It's hanging up df, lsof, and pretty much anything else that queries FS state. Also ls /keybase (separated from the previous sentence to clarify).

Can confirm that running keybase ctl stop as non-root worked.

Running the given sudo apt command to install v5.8.1-20210930160723.fefa22edc1 did not work. "E: Version '5.8.1-20210930160723.fefa22edc1' for 'keybase' was not found"

It would be nice if we could browse the S3 bucket, but whatever. Downloading the .deb worked, but installing it did not, or at least it seems like it didn't work:

oot@exocortex:~# dpkg -i ~drwho/keybase_5.8.1-20210930160723.fefa22edc1_amd64.d
eb 
dpkg: warning: downgrading keybase from 5.9.0-20211217212642.29bfd9d39f to 5.8.1
-20210930160723.fefa22edc1
(Reading database ... 179816 files and directories currently installed.)
Preparing to unpack .../keybase_5.8.1-20210930160723.fefa22edc1_amd64.deb ...
Unpacking keybase (5.8.1-20210930160723.fefa22edc1) over (5.9.0-20211217212642.29bfd9d39f) ...
Setting up keybase (5.8.1-20210930160723.fefa22edc1) ...
mkdir: cannot create directory ‘/keybase’: File exists
chown: cannot access '/keybase': Transport endpoint is not connected
chmod: cannot access '/keybase': Transport endpoint is not connected
dpkg: error processing package keybase (--install):
 installed keybase package post-installation script subprocess returned error ex it status 1
Processing triggers for mime-support (3.64ubuntu1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for shared-mime-info (1.15-1) ...
Errors were encountered while processing:
 keybase

Mountpoint /keybase did not exit. Attempting the installation again resulted in its creation and "Device or resource busy" when trying to remove it. keybase.mount is showing up as loaded, active, and mounted in the output of systemtl, however. After some fiddling, I got systemctl stop keybase.mount to work and then I could manually install the .deb package. Marked as "held."

Starting my isolated Keybase processes with systemctl --user worked.

virtadpt avatar Jan 09 '22 19:01 virtadpt

Can confirm that running keybase ctl stop as non-root worked.

Yes, I believe it did

Running the given sudo apt command to install v5.8.1-20210930160723.fefa22edc1 did not work

Yes, this also didn't work for me on subsequent machines. I'll update the description on the issue to capture that

Downloading the .deb worked, but installing it did not, or at least it seems like it didn't work:

I got the same output that you see there @virtadpt but it did resolve the problem for me and Keybase now works and is running v5.8.1

Are you saying that after doing the dpkg -i ~drwho/keybase_5.8.1-20210930160723.fefa22edc1_amd64.deb Keybase wasn't working and you had to do further steps to get it working? I ask because on the 3 systems I've done this on, merely running the downgrade was all that was needed and Keybase and KBFS was working after that (despite the error messages shown during the downgrade that you mention)

gene1wood avatar Jan 09 '22 21:01 gene1wood

Can someone experiencing this issue please send a log (keybase log send) when experiencing this issue on the latest version? Also, does it happen after a system restart?

heronhaye avatar Jan 10 '22 17:01 heronhaye

@gene1wood Yes, I had to do it several times before it worked. The key seems to have been sudo systemctl stop keybase.mount several times a few minutes apart before 0) the process trying to mount /keybase finally stopped and 1) dpkg terminated and unlocked the package database. Once I got it successfully installed it worked.

virtadpt avatar Jan 10 '22 19:01 virtadpt

@heronhaye I can send one right now if you like. Things seem to be doing pretty well right now on my box.

Once I got it successfully installed, it did not happen when I restarted the system.

virtadpt avatar Jan 10 '22 19:01 virtadpt

@virtadpt Please do, thanks. Post the log id here.

heronhaye avatar Jan 11 '22 15:01 heronhaye

For anyone in this situation, does the following help?

sudo fusermount -uz /keybase
run_keybase

If so, I believe we have located the issue and should be able to push out a fix shortly.

heronhaye avatar Jan 13 '22 21:01 heronhaye

@heronhaye Keybase log ID: 6f9b6814a3d454639a8ffc1c

virtadpt avatar Jan 15 '22 21:01 virtadpt

Can someone experiencing this issue please send a log (keybase log send) when experiencing this issue on the latest version? Also, does it happen after a system restart?

@heronhaye You got it, chief. Log id: cb9e9c66959430fcdfc6b21c Yes, it occurs after a system restart and also when just logging out and back in again.

As to the fusermount command fixing the issue, I'm afraid it didn't, however issuing systemctl --user stop keybase-redirector.service && sleep 3; systemctl --user start keybase-redirector.service appears to get everything working okay. Oddly just issuing a restart to systemd for the service doesn't, perhaps because it needs a longer interval before re-execution? This one's puzzling, to be sure.

RogueScholar avatar Jan 17 '22 08:01 RogueScholar

@RogueScholar @virtadpt Would you mind trying a test build to see if it fixes the redirector issue? It may require either sudo fusermount -uz /keybase or a computer reboot to release the mountpoint before restarting Keybase after installation. Thanks.

.deb: https://s3.amazonaws.com/tests.keybase.io/linux_binaries/deb/index.html .rpm: https://s3.amazonaws.com/tests.keybase.io/linux_binaries/rpm/index.html

Please download the installer with the git commit 95a3939b3a. You can verify the signature from the sig file on the binary with the public key from https://book.keybase.io/docs/server/our-code-signing-key. Thanks. Let me know if you run into any issues, and if this solves the problem.

heronhaye avatar Jan 18 '22 19:01 heronhaye

Hi, we believe we have fixed this issue. Please try upgrading Keybase. Thank you.

You will likely have to reboot your computer after the upgrade.

heronhaye avatar Jan 20 '22 18:01 heronhaye

@heronhaye Thank you I'll upgrade and try it out. Is the fix for this problem found in #24789 ?

gene1wood avatar Jan 20 '22 19:01 gene1wood

Yes. We did not realize that setreuid modified the suid bit; however, in previous Go versions, the calling goroutine took place in a different OS thread, so the bug did not manifest. In the latest Go version, the goroutine was scheduled on the same OS thread so the modified suid bit persisted.

heronhaye avatar Jan 20 '22 19:01 heronhaye

I'll give it a try tomorrow. Probably going to be working another late nighter tonight.

virtadpt avatar Jan 20 '22 19:01 virtadpt

So on Ubuntu 18.04 I upgraded to the new version, and the problem manifested during the upgrade and apt hung

Preparing to unpack .../06-keybase_5.9.0-20220120174718.95a3939b3a_amd64.deb ...
Unpacking keybase (5.9.0-20220120174718.95a3939b3a) over (5.8.1-20210930160723.fefa22edc1) ...

...


Setting up keybase (5.9.0-20220120174718.95a3939b3a) ...
Autorestarting Keybase via systemd for gene.
Restarted existing root redirector via systemd.


Progress: [ 88%] [###########################################.................] 

I can confirm the problem is happening as in another terminal if I do an ls / ls hangs. I can unhang things by running sudo fusermount -uz /keybase as suggested though it looks like this doesn't unhang the apt update.

I'm not saying that this means the problem persists into the new version 5.9.0-20220120174718.95a3939b3a, just that the upgrade process hangs.

Looking at the process tree the step in the post_install.sh where it hangs is when it does chown root:root /keybase

root     20212  0.0  0.3 166164 103484 pts/0   S+   11:03   0:06  |   |       \_ apt -y upgrade
root     28831  0.0  0.0  27868 12132 pts/1    Ss+  11:09   0:00  |   |           \_ /usr/bin/dpkg --status-fd 151 --configure --pending
root     17117  0.0  0.0  12888  3152 pts/1    S+   11:13   0:00  |   |               \_ /bin/bash /var/lib/dpkg/info/keybase.postinst configure 5.8.1-20210930160723.fefa22edc1
root     17119  0.0  0.0  13020  3368 pts/1    S+   11:13   0:00  |   |                   \_ bash /opt/keybase/post_install.sh
root     20389  0.0  0.0  15960  2440 pts/1    S+   11:13   0:00  |   |                       \_ chown root:root /keybase

I killed the chown process and the apt update completed with

/opt/keybase/post_install.sh: line 26: 20389 Terminated              chown root:root "$rootmount"

...

W: Operation was interrupted before it could finish

I then quit keybase in the GUI, confirmed it was not running in the process list and then ran

sudo apt install --reinstall keybase

which completed successfully. After that I had two new icons in the Ubuntu status bar which I've never seen before

new icons in ubuntu status bar

I did another apt upgrade thinking maybe these new icons relate to an update issue but apt didn't identify any packages that needed updating (which makes sense).

I rebooted and after reboot the problem is manifesting immediately. If I do an ls / it hangs. I checked what version of deb package is installed and it is the new one 5.9.0-20220120174718.95a3939b3a

The status bar icons above are however gone after reboot.

I run sudo fusermount -uz /keybase and the system unhangs.

@heronhaye I'm unsure if the new version fixes the issue

I've run keybase log send

my log id: 07ea015caf5b1f5553593a1c

gene1wood avatar Jan 20 '22 21:01 gene1wood

@heronhaye Ya, I can confirm this new version doesn't seem to fix the issue. On version 5.9.0-20220120174718.95a3939b3a on Ubuntu 18.04, immediately on boot doing an ls / hangs. Running sudo fusermount -uz /keybase unhangs it.

@heronhaye Should we open a new issue as this and #24749 are currently closed?

gene1wood avatar Jan 22 '22 16:01 gene1wood

Sorry folks, continuing to investigate.

heronhaye avatar Jan 24 '22 16:01 heronhaye

@gene1wood Unfortunately I'm having trouble reproducing this bug on either Ubuntu 18.04 or 21.10.

First question, what are the permissions on /keybase from ls -l /?

Second, if you are able to, could you first try to disable the redirector with sudo keybase --use-root-config-file ctl redirector --disable. After this and rebooting, is your system and Keybase functional again, with the exception of the /keybase mountpoint, which should be alternately accessible from $XDG_RUNTIME_DIR/keybase/kbfs?

Thanks for your detailed reports so far.

heronhaye avatar Jan 24 '22 18:01 heronhaye

First question, what are the permissions on /keybase from ls -l /?

So when it's working and not exhibiting the problem it shows as this

dr-xr-xr-x   1 root root          0 Jan 24 07:32 keybase

When the problem occurs I can't do an ls of /. After I run sudo fusermount -uz /keybase the permissions show differently, something with a lot of ? instead of rwx.

could you first try to disable the redirector

Sure, I'll reboot without running that, confirm that the problem occurs, then run it and reboot and see if it still occurs.

For example when I booted today running 5.9.0-20220120174718.95a3939b3a I haven't had a problem. I'm unsure what the conditions are that trigger the issue.

After rebooting, I was hoping I could reproduce it, but it hasn't happened yet.

I've produced a set of logs with ID 9e4020aadb104411b5d20d1c for keybase where the problem isn't occurring in hopes that when it begins occurring I can get another set of logs and the comparison may reveal something.

gene1wood avatar Jan 24 '22 18:01 gene1wood

Strange. Let me know if you manage to run into it again. Thanks again for the detailed reports. I'll post in the other thread as well.

heronhaye avatar Jan 25 '22 15:01 heronhaye

I'm on Fedora 35, with keybase-5.9.0.20220120174718.95a3939b3a-1.x86_64 I tried the procedure

sudo fusermount -uz /keybase
run_keybase

and the run_keybase command never finished

Journalctl tells me

janv. 25 22:57:17 mymachine systemd[2283]: Starting Mark boot as successful...
janv. 25 22:57:17 mymachine systemd[2283]: Finished Mark boot as successful.
janv. 25 22:57:22 mymachine Keybase[3312]: Quit through before-quit
janv. 25 22:57:22 mymachine Keybase[3312]: Quit the app
janv. 25 22:57:22 mymachine Keybase[3312]: Exec (null) not available for platform: darwin != linux
janv. 25 22:57:22 mymachine Keybase[3312]: Done with ctlstop
janv. 25 22:57:22 mymachine Keybase[3312]: exiting app
janv. 25 22:57:22 mymachine Keybase[3312]: browser window killed
janv. 25 22:57:22 mymachine Keybase[3312]: browser window killed
janv. 25 22:57:22 mymachine Keybase[3312]: [3312:0125/225722.033102:FATAL:gpu_data_manager_impl_private.cc(445)] GPU process isn't usable. Goodbye.
janv. 25 22:57:22 mymachine kernel: show_signal: 107 callbacks suppressed
janv. 25 22:57:22 mymachine kernel: traps: Chrome_IOThread[3466] trap int3 ip:557a1a4b9649 sp:7f8a64b8aba0 error:0 in Keybase[557a17e56000+6068000]
janv. 25 22:57:22 mymachine audit: BPF prog-id=63 op=LOAD
janv. 25 22:57:22 mymachine audit: BPF prog-id=64 op=LOAD
janv. 25 22:57:22 mymachine audit: BPF prog-id=65 op=LOAD
janv. 25 22:57:22 mymachine systemd[1]: Started Process Core Dump (PID 5821/UID 0).
janv. 25 22:57:22 mymachine audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@1-582>
janv. 25 22:57:22 mymachine systemd[1]: run-user-1000-keybase-kbfs.mount: Deactivated successfully.
janv. 25 22:57:22 mymachine sh[5835]: fusermount: entry for /run/user/1000/keybase/kbfs not found in /etc/mtab
janv. 25 22:57:22 mymachine systemd[2283]: Reloading.
janv. 25 22:57:22 mymachine systemd[2283]: Stopping Keybase core service...
janv. 25 22:57:22 mymachine systemd-coredump[5825]: [🡕] Process 3312 (Keybase) of user 1000 dumped core.
                                               
                                               Found module linux-vdso.so.1 with build-id: 5adf6bc52f5c51e10c5d1229124d27eebdae3b02
                                               Found module libdbusmenu-glib.so.4 with build-id: 6bc3a1183dab693921aaa6b6c6eb265c59e02bd5
                                               Found module libdbusmenu-gtk3.so.4 with build-id: c7b725aae4ca29ca64de268a33435a6cc7704650
                                               Found module libindicator3.so.7 with build-id: 4b6c12c50f3b52923760df92f3be559a69c0e6d2
                                               Found module libappindicator3.so.1 with build-id: 52a34c2096ee72837f18aabefff6c34e759d46d9
                                               Found module libudev.so.1 with build-id: 4a00048d488bd5a48980577f23fcaced551223c3
                                               Found module libdconfsettings.so with build-id: f07853974014f14e04029f524a32d4aea9558c1d
pleinty of stacktrace
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Main process exited, code=killed, status=5/TRAP
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Failed with result 'signal'.
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Unit process 5808 (Keybase) remains running after unit stopped.
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Unit process 5816 (Keybase) remains running after unit stopped.
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Unit process 5817 (Keybase) remains running after unit stopped.
janv. 25 22:57:22 mymachine systemd[2283]: keybase.gui.service: Consumed 5.227s CPU time.
janv. 25 22:57:22 mymachine systemd[1]: [email protected]: Deactivated successfully.

ncharles avatar Jan 25 '22 22:01 ncharles

@heronhaye Ok, just had to wait and the problem recurred.

On this boot the problem manifested, either it didn't start until 25 minutes into using the system or it started at boot and I didn't notice.

I produced a set of logs and sent them while the problem was occurring Log ID : d1c8a8d206f5e43d5cb8761c

When I run sudo keybase --use-root-config-file ctl redirector --disable I get this :

$ sudo keybase --use-root-config-file ctl redirector --disable
Redirector configuration updated.
▶ ERROR Failed to delete mountpoint at /keybase: remove /keybase: device or resource busy
▶ ERROR If KBFS is not being used, run `# pkill -f keybase-redirector`, delete /keybase and try again.
▶ ERROR remove /keybase: device or resource busy

I then ran sudo fusermount -uz /keybase to unhang my system

Running sudo keybase --use-root-config-file ctl redirector --disable a second time

I get now

$ sudo keybase --use-root-config-file ctl redirector --disable
Redirector configuration updated.
Redirector mount deletion successful.
Please run `# pkill -f keybase-redirector` to stop the redirector for all users.

gene1wood avatar Jan 26 '22 00:01 gene1wood

After a few days, I haven't seen the problem (but I also don't have the /keybase/ mount). I've re-enabled it just now (so I can access KBFS) by running

$ sudo keybase --use-root-config-file ctl redirector --enable
Redirector configuration updated.
Redirector mount created.
Please run `run_keybase` to start the redirector for each user using KBFS.

and I'll report back here if the problem begins occurring again

gene1wood avatar Jan 27 '22 16:01 gene1wood

@heronhaye And the symptom manifested again (on version 5.9.0-20220120174718.95a3939b3a), hanging processes that do anything with /.

I stopped keybase and it unhung everything.

Is there something I can do when this next occurs to get you the details you need to determine the cause of this? When I next see things hang because of keybase, what should I run to help?

gene1wood avatar Jan 31 '22 19:01 gene1wood

Just like to confirm that I get the same "upgrade hangs" or also reinstall hangs issue on Ubuntu 20.04. However so far I have not noticed any of the mounting issues.

dietmarw avatar Feb 02 '22 07:02 dietmarw

Seems to have been fixed for me with the latest v5.9.1release.

dietmarw avatar Feb 07 '22 09:02 dietmarw