rust icon indicating copy to clipboard operation
rust copied to clipboard

change std::process to drop supplementary groups based on CAP_SETGID

Open GrigorenkoPV opened this issue 1 year ago • 5 comments

A trivial rebase of #95982

Should fix #39186 (from what I can tell)

Original description:

Fixes #88716

  • Before this change, when a process was given a uid via std::os::unix::process::CommandExt.uid, there would be a setgroups call (when the process runs) to clear supplementary groups for the child if the parent was root (to remove potentially unwanted permissions).
  • After this change, supplementary groups are cleared if we have permission to do so, that is, if we have the CAP_SETGID capability.

This new behavior was agreed upon in #88716 but there was a bit of uncertainty from @Amanieu here: #88716 (comment)

I agree with this change, but is it really necessary to ignore an EPERM from setgroups? If you have permissions to change UID then you should also have permissions to change groups. I would feel more comfortable if we documented set_uid as requiring both UID and GID changing permissions.

The way I've currently written it, we ignore an EPERM as that's what #88716 originally suggested. I'm not at all an expert in any of this so I'd appreciate feedback on whether that was the right way to go.

GrigorenkoPV avatar Feb 26 '24 20:02 GrigorenkoPV

r? @ChrisDenton

rustbot has assigned @ChrisDenton. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot avatar Feb 26 '24 20:02 rustbot

The change was fcp'd and I'd be happy to r+ it on that basis but cc @Amanieu as you expressed specific implementation and documentation concerns in https://github.com/rust-lang/rust/issues/88716#issuecomment-973366600

ChrisDenton avatar Feb 28 '24 18:02 ChrisDenton

The job x86_64-gnu-llvm-16 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
GITHUB_ACTION=__run_7
GITHUB_ACTIONS=true
GITHUB_ACTION_REF=
GITHUB_ACTION_REPOSITORY=
GITHUB_ACTOR=GrigorenkoPV
GITHUB_API_URL=https://api.github.com
GITHUB_BASE_REF=master
GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_626cd173-7eed-425b-8357-6f29a19ccffd
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json
GITHUB_GRAPHQL_URL=https://api.github.com/graphql
GITHUB_HEAD_REF=cap_setgid
GITHUB_JOB=pr
GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_626cd173-7eed-425b-8357-6f29a19ccffd
GITHUB_REF=refs/pull/121650/merge
GITHUB_REF_NAME=121650/merge
GITHUB_REF_PROTECTED=false
---
GITHUB_SERVER_URL=https://github.com
GITHUB_SHA=b597a7fe0189b9599776df89a3732cd200ff9c24
GITHUB_STATE=/home/runner/work/_temp/_runner_file_commands/save_state_626cd173-7eed-425b-8357-6f29a19ccffd
GITHUB_STEP_SUMMARY=/home/runner/work/_temp/_runner_file_commands/step_summary_626cd173-7eed-425b-8357-6f29a19ccffd
GITHUB_TRIGGERING_ACTOR=GrigorenkoPV
GITHUB_WORKFLOW_REF=rust-lang/rust/.github/workflows/ci.yml@refs/pull/121650/merge
GITHUB_WORKFLOW_SHA=b597a7fe0189b9599776df89a3732cd200ff9c24
GITHUB_WORKSPACE=/home/runner/work/rust/rust
GOROOT_1_19_X64=/opt/hostedtoolcache/go/1.19.13/x64
---
#12 writing image sha256:eb886be320d7d9b32c03e080a9a472b50243a691795971e655e0c5318196365e done
#12 naming to docker.io/library/rust-ci done
#12 DONE 10.1s
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-16]
##[group]Clock drift check
  local time: Thu Feb 29 14:21:20 UTC 2024
  network time: Thu, 29 Feb 2024 14:21:20 GMT
  network time: Thu, 29 Feb 2024 14:21:20 GMT
##[endgroup]
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-16', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'build.optimized-compiler-builtins', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-16/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id            := 99999999
---
---- [ui] tests/ui/command/command-uid-gid.rs stdout ----

error: test run failed!
status: exit status: 101
command: cd "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/command/command-uid-gid" && RUST_TEST_THREADS="16" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/command/command-uid-gid/a"
--- stderr -------------------------------
thread 'main' panicked at /checkout/tests/ui/command/command-uid-gid.rs:24:18:
thread 'main' panicked at /checkout/tests/ui/command/command-uid-gid.rs:24:18:
called `Result::unwrap()` on an `Err` value: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
------------------------------------------



rust-log-analyzer avatar Feb 29 '24 14:02 rust-log-analyzer

I've traced the history of the failing test: it dates all the way back to when std::process was added to the standard library.

The documentation for the uid method only says that this maps to a setuid call, which will succeed if you pass it the current uid. It doesn't say anything about groups.

However this PR changes uid to also drop supplementary groups using setgroups, which is important to avoid "leaking" privileged groups to an unprivileged process. This breaks the test because calling setgroups always requires elevated privileges and will fail with EPERM otherwise.

The expected impact in practice is small: with a github code search I could only find examples of code that is already running as root dropping down to non-root privileges.

@rust-lang/libs-api How do we feel about making this (possibly breaking) change?

The previous PR ignored any EPERM errors from setgroups, but I argued against it because ignoring errors can be very error-prone for something as sensitive as dropping privileges.

Amanieu avatar Mar 12 '24 11:03 Amanieu

If you have permissions to change UID then you should also have permissions to change groups.

So, there's a weird interactions with containers here, where if we fail on EPERM, that would potentially break users of Linux containers / user namespaces.

In theory, with standard UNIX permissions, it's possible to use supplementary groups as a "negative permission" for something. For instance, in theory you could do this:

chown root:noxyz /usr/bin/xyz
chmod 0705 /usr/bin/xyz

And then users who don't have the noxyz group can run xyz, but users who do have the group noxyz cannot. And in the standard UNIX permission model, users don't normally have permissions to call setgroups to drop supplementary groups themselves.

This is not a very reasonable use of permissions, and almost nobody does this. There are a few random anecdotes of universities having a nogames group. Nonetheless, there were a non-zero number of users, and Linux treated this as a security matter.

Linux supports unprivileged creation of user namespaces, and within a user namespace, users can normally call setuid/setgid, and used to be able to call setgroups as well, until this arose as a security concern, at which point Linux created a mechanism to restrict setgroups.

Now, if you're setting up a user namespace, if you don't set up a gid_map for your container, you won't be permitted to call setgroups at all. If you do want to set up a gid_map, you either have to have CAP_SETGID in the parent namespace, or you have to write deny to /proc/${pid}/setgroups first, denying processes in the container from calling setgroups.

See man 7 user_namespaces for more information.

The reason I have the context on this:

  • Years ago I attempted to add a "supplementary UIDs" mechanism to Linux (e.g. setusers analogous to setgroups, so that users could have a set of UIDs that they could use for e.g. setting up unprivileged containers).
  • Someone pointed out the theoretical possibility of someone using ---r-xr-x permissions to deny a specific user permissions to something.
  • In attempting to argue that this shouldn't be treated as a fatal problem, I observed that the same was already true of setgroups, which at that time could be dropped via user namespaces.
  • This was promptly treated as a security bug in user namespaces, resulting in this redesign of the handling of setgroups within user namespaces (and the rejection of setusers).

A quick check on my system shows, for instance, that Firefox creates containers that have setgroups set to deny.

As a result of this, if an unprivileged user sets up a user namespace (as happens regularly to sandbox code), and code running within that namespace wants to drop permissions (e.g. going from container-root to container-unprivileged-user), they can call setuid/setgid to change those permissions, but they can't call setgroups. Which means, if we fail on EPERM from setgroups, calling .uid(...)here will cause process launch to fail for non-obvious and non-documented reasons, and users won't have an alternative other than manually callingsetuid/setgidthemselves inpre_exec` (which seems more unsafe and more error-prone).

I would propose that if you're getting EPERM from setgroups, ignoring that error would not result in unexpectedly retaining privileges, because in the common scenario of being system-root you'll have the requisite permissions, and in the container-root scenario that produces EPERM, it's expected that a process in a container may not be able to drop group permissions, and I think it'd be surprising and unexpected if users have to go out of their way to do something unusual (e.g. call something other than uid or call a different function before/after uid) in order to work inside a container.

Thus, I'd propose that we ignore EPERM from the call to setgroups caused by uid.

(Separately, we should figure out what to do with the call caused by groups, but that's still a nightly-only API.)

I also think we should document very clearly what uid does.

Summary:

  • Containers created by unprivileged users are a common scenario where you have permission to call setuid and setgid but not setgroups.
  • We should document what uid does, why it calls setgroups, and how it handles EPERM, with a link to https://www.man7.org/linux/man-pages/man7/user_namespaces.7.html.
  • I think we should ignore EPERM errors from the setgroups call that uid causes, so that software works in containers by default without having to add special handling for the container case.
  • We should add an unresolved question to the tracking issue for the groups nightly API to decide how to handle EPERM there.
  • If we end up not ignoring EPERM errors, we need to provide some reasonable way for users to work as expected in containers. I think that wouldn't be the ideal solution, though.

joshtriplett avatar Mar 12 '24 18:03 joshtriplett

  • I think we should ignore EPERM errors from the setgroups call that uid causes, so that software works in containers by default without having to add special handling for the container case.

Added a commit reverting the change. Not squashing to reflect the decision-making process in the commit history. But if you would like me to, I can just reset the branch back to the initial 3a6af84fca1b41614887748e9fdb0e2b2a13fb96.

GrigorenkoPV avatar Mar 13 '24 15:03 GrigorenkoPV

I agree with Josh's summary, let's just revert back to the original version that ignores EPERM errors.

r=me once the revert is squashed.

Amanieu avatar Mar 13 '24 20:03 Amanieu

I thought GitHub wouldn't re-run the CI for a commit that was already tested before, but whataver.

The PR is now "squashed" (actually just reverted back to the first commit).

GrigorenkoPV avatar Mar 13 '24 20:03 GrigorenkoPV

@bors r+

Amanieu avatar Mar 13 '24 22:03 Amanieu

:pushpin: Commit 3a6af84fca1b41614887748e9fdb0e2b2a13fb96 has been approved by Amanieu

It is now in the queue for this repository.

bors avatar Mar 13 '24 22:03 bors