envfs fuse deadlock with envfs

I was using envfs in my main config, when I was able to trigger a fuse deadlock which requires a restart. I don't have a minimal reproduction case, but it can be reliably reproduced on my machine with envfs enabled in the nixos config. Best to do this with nixos-rebuild build-vm:

git clone https://github.com/ocaml/dune.git
cd dune
nix develop -c dune test test/blackbox-tests/test-cases/ocaml-index/project-indexation.t/

This unfortunately takes a little while since the flake has to build some stuff, but once dune has finished it will look like its stuck on the last job. Here if you inspect using ps you will find ocamlopt (the ocaml compiler) stuck in an uninterruptible wait and the fuse daemon fails to respond to other requests.

I am not certain about the underlying cause, but it's definitely envfs causing this issue. I suspect there might be a race-condition with access to certain files, for instance the ocaml compiler and ocaml-index are accessing the same .cmi files very quickly. My hunch is that this is racing and causing a deadlock.

If you are able to reproduce this in a vm with the above instructions, I would be interested in knowing how you go about debugging what is happening in envfs that is causing the issue.

Jul 28 '25 08:07 Alizter

Here is the proc stack trace for ocamlopt that hangs:

Here is the stack trace for envfs:

Jul 28 '25 08:07 Alizter

Can confirm. For me it happens with zed editor when used from under nix-shell:

$ nix-shell 
$ zeditor .
$ ps -ax -o pid,stat,comm | grep ' D '
    292 D    kworker/u64:3+events_unbound
$ uname 
╭──────────────────┬───────────────────────────────────────────────────────────╮
│ kernel-name      │ Linux                                                     │
│ nodename         │                                                           │
│ kernel-release   │ 6.12.40                                                   │
│ kernel-version   │ #1-NixOS SMP PREEMPT_DYNAMIC Thu Jul 24 06:56:38 UTC 2025 │
│ machine          │ x86_64                                                    │
│ operating-system │ GNU/Linux                                                 │
╰──────────────────┴───────────────────────────────────────────────────────────╯

I've seen something else trigger this just didn't catch it at the time, so it's likely not zed but envfs.

Jul 30 '25 08:07 Granitosaurus

This has been biting my systems too. With my workflows it happens all the time. I've loved using envfs, but I also need my computer to work, so I will have to disable it for now. In the hope that is useful, here is the trace from dmesg during the hang:

[  369.885128] INFO: task mount.envfs:1116 blocked for more than 122 seconds.
[  369.885151]       Not tainted 6.17.2 #1-NixOS
[  369.885155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.885158] task:mount.envfs     state:D stack:0     pid:1116  tgid:1115  ppid:1      task_flags:0x400140 flags:0x00004002
[  369.885169] Call Trace:
[  369.885175]  <TASK>
[  369.885188]  __schedule+0x474/0x12f0
[  369.885208]  ? __d_alloc+0x46/0x2a0
[  369.885216]  schedule+0x27/0xd0
[  369.885220]  d_alloc_parallel+0x393/0x470
[  369.885227]  ? __pfx_default_wake_function+0x10/0x10
[  369.885234]  __lookup_slow+0x5f/0x130
[  369.885243]  walk_component+0xdb/0x150
[  369.885249]  path_lookupat+0x55/0x180
[  369.885255]  filename_lookup+0xf4/0x200
[  369.885262]  ? __pfx_page_put_link+0x10/0x10
[  369.885269]  user_path_at+0x56/0x90
[  369.885275]  do_faccessat+0xff/0x2e0
[  369.885279]  __x64_sys_access+0x1c/0x30
[  369.885283]  do_syscall_64+0xb7/0x3a0
[  369.885290]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  369.885296] RIP: 0033:0x7f0d3350f76b
[  369.885374] RSP: 002b:00007f0d333fe228 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[  369.885384] RAX: ffffffffffffffda RBX: 00005641fa50fb30 RCX: 00007f0d3350f76b
[  369.885389] RDX: 0000000000000055 RSI: 0000000000000001 RDI: 00007f0d333fe238
[  369.885393] RBP: 00007f0d333fe6a0 R08: 8080808080808080 R09: 0101010101010100
[  369.885397] R10: fefff8fcfff0fffe R11: 0000000000000297 R12: 00007f0d333fe238
[  369.885401] R13: 0000000000000000 R14: 00007f0d2c026490 R15: 0000000000000055
[  369.885415]  </TASK>
[  492.764829] INFO: task mount.envfs:1116 blocked for more than 245 seconds.
[  492.764846]       Not tainted 6.17.2 #1-NixOS
[  492.764850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.764854] task:mount.envfs     state:D stack:0     pid:1116  tgid:1115  ppid:1      task_flags:0x400140 flags:0x00004002
[  492.764864] Call Trace:
[  492.764868]  <TASK>
[  492.764877]  __schedule+0x474/0x12f0
[  492.764890]  ? __d_alloc+0x46/0x2a0
[  492.764897]  schedule+0x27/0xd0
[  492.764903]  d_alloc_parallel+0x393/0x470
[  492.764910]  ? __pfx_default_wake_function+0x10/0x10
[  492.764916]  __lookup_slow+0x5f/0x130
[  492.764924]  walk_component+0xdb/0x150
[  492.764930]  path_lookupat+0x55/0x180
[  492.764938]  filename_lookup+0xf4/0x200
[  492.764946]  ? __pfx_page_put_link+0x10/0x10
[  492.764954]  user_path_at+0x56/0x90
[  492.764961]  do_faccessat+0xff/0x2e0
[  492.764967]  __x64_sys_access+0x1c/0x30
[  492.764972]  do_syscall_64+0xb7/0x3a0
[  492.764979]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  492.764985] RIP: 0033:0x7f0d3350f76b
[  492.765037] RSP: 002b:00007f0d333fe228 EFLAGS: 00000297 ORIG_RAX: 0000000000000015
[  492.765043] RAX: ffffffffffffffda RBX: 00005641fa50fb30 RCX: 00007f0d3350f76b
[  492.765047] RDX: 0000000000000055 RSI: 0000000000000001 RDI: 00007f0d333fe238
[  492.765050] RBP: 00007f0d333fe6a0 R08: 8080808080808080 R09: 0101010101010100
[  492.765053] R10: fefff8fcfff0fffe R11: 0000000000000297 R12: 00007f0d333fe238
[  492.765056] R13: 0000000000000000 R14: 00007f0d2c026490 R15: 0000000000000055
[  492.765065]  </TASK>
....

Oct 19 '25 19:10 matthewwardrop

I can't reproduce the dune nor zeditor freezes, but at least a similar effect seems to arise when the PATH contains a symlink (possibly indirect) to envfs that has the same name as the file being looked for. (This originally surfaced in a weirdly-configured Python venv.)

mkdir /tmp/example
cd /tmp/example
ln -s /usr/bin/something problem-test
PATH=/tmp/example:$PATH
/usr/bin/problem-test

Tracing it down led to the access call in fs::_which:

https://github.com/Mic92/envfs/blob/09fdf94b2a9570c9abf0419fa4863a8e6cc1ce85/src/fs.rs#L278

This follows the symlink /tmp/example/problem-test -> /usr/bin/something, leading to an access query against envfs while the original /usr/bin/problem-test is being resolved, creating a deadlock (if nested queries are somehow supported, this is still a problem if the name is the same for the link and the original file, creating a loop).

The first fix that comes to mind is manually unrolling symlinks (up to the default symlink limit) and checking whether at any point the path ends up in an envfs, but that seems a bit clunky and only works for envfs -> envfs, but not if some other (similar) tool redirects (envfs -> x -> envfs); I don't know whether there is a more generic way to prevent such a deadlock. (Setting a timeout could also resolve this, I guess.)

Nov 16 '25 00:11 marko213