`tedge connect` does not detect mapper/agent via symlinks
Discussed in https://github.com/thin-edge/thin-edge.io/discussions/880
Originally posted by toewsar February 15, 2022
When using tedge connect c8y it tries to find out if the mapper and agent are installed. This is done by the RUST-which().
In my case the binaries aren't installed at /usr/bin or similar. Therefore the tedge cli does not find the installed mapper and agent.
I tried to create symlinks to /usr/bin to tell tedge that the mapper and agent are installed, but it still failed. The shell command which found the binaries.
$ sudo which tedge_mapper /usr/bin/tedge_mapper
$ sudo tedge connect c8y Checking if sysv is available.
Checking if configuration for requested bridge already exists.
Validating the bridge certificates.
Create the device.
Saving configuration for requested bridge.
Restarting mosquitto service.
Awaiting mosquitto to start. This may take up to 5 seconds.
Enabling mosquitto service on reboots.
Successfully created bridge connection!
Sending packets to check connection. This may take up to 2 seconds.
Connection check is successfull.
Checking if tedge-mapper is installed.
Warning: tedge_mapper is not installed.
Enabling software management.
Checking if tedge-agent is installed.
Info: Software management is not installed. So, skipping enabling related components.
It would be nice to also accept symlinks.
The issue has been observed by @toewsar.
The first point to note is that the mapper is not detected through the symlink while things are okay for the agent where there is no indirection.
$ ls -lh /usr/bin/tedge*
-rwxr-xr-x 1 root root 7.0M Feb 16 10:31 /usr/bin/tedge
-rwxr-xr-x 1 root root 7.0M Feb 16 10:31 /usr/bin/tedge_agent
lrwxrwxrwx 1 root root 40 Feb 16 10:36 /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper
$ sudo tedge connect c8y
...
Checking if tedge-mapper is installed.
Warning: tedge_mapper is not installed.
...
Checking if tedge-agent is installed.
Starting tedge-agent service.
...
I've written a small Rust program which only calls which:
use std::env;
use which::which;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
println!("executable as parameter needed");
} else {
if which(&args[1]).is_err() {
println!("Nope: `{}` is not found by `which`.", &args[1]);
} else {
println!("Yes: `{}` is found by `which`.", &args[1]);
}
}
}
With both which 4.2.2 and 4.2.4, it behaves correctly:
$ ls -lh /usr/bin/tedge*
-rwxr-xr-x 1 ... /usr/bin/tedge
-rwxr-xr-x 1 ... /usr/bin/tedge_agent
lrwxrwxrwx 1 ... /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper # working link
lrwxrwxrwx 1 ... /usr/bin/tedge_mapper2 -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper2 # non-existing target
Output:
admin@HMI-1723:~$ ./whichtest tedge_agent
Yes: `tedge_agent` is found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper
Yes: `tedge_mapper` is found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper2
Nope: `tedge_mapper2` is not found by `which`.
$PATH has nothing to do with this, I've added a hardcoded check for /usr/bin/tedge_mapper and neither is this found by tedge
After some further analyses, we found out that a symlink alone is not the problem. It has to do with permissions.
We're running tedge as root confirmed by:
use users::get_current_username;
let uname = get_current_username().unwrap();
println!("Running as user {:?}", &uname);
The link is defined as follows:
$ ls -lh /usr/bin/tedge_mapper
lrwxrwxrwx 1 root root 22 Feb 16 14:38 /usr/bin/tedge_mapper -> /usr/blub/tedge_mapper
The directory, in which the actual tedge_mapper resides, has these permissions:
$ ls -lha /usr/blub/
drwxr-xr-- 2 root data 4.0K Feb 16 14:37 .
drwxr-xr-x 11 root root 4.0K Feb 16 14:37 ..
-rwxr-xr-x 1 root root 7.5M Feb 16 14:37 tedge_mapper
We're root, owner is root with execute permissions, things should be fine. But they are not!
root is not in group data, but on its own in a shell, this is not a problem.
When I run this debug code on its own, it works!
use std::env;
use std::fs;
use which::which;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
println!("executable as parameter needed");
} else {
match which(&args[1]) {
Err(e) => println!("🏀Nope: `{}` is not found by `which`. {}", &args[1], e),
Ok(p) => {
println!("🏀Yes: `{}` is found by `which`. {:?}", &args[1], p);
match fs::metadata(&p) {
Ok(m) => println!("🏀Trying {:?} filetype {:?} symlink {:?}", &p, &m.file_type(), &m.is_symlink()),
Err(e) => println!("🏀Failed {:?} {:?}", &p, e),
}
}
}
}
}
According to the output, which finds the target tedge_mapper file (metadata follows symlinks, therefore this attribute is false (I can't see the sense behind this, but anyway))!
$ ./whichtest tedge_mapper
🏀Yes: `tedge_mapper` is found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false
$ sudo ./whichtest tedge_mapper
🏀Yes: `tedge_mapper` is found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false
which is not the problem's originator.
std::fs::metadata is.
I've added the following to ConnectCommand::Command (file command.rs , after the line
println!("Checking if tedge-mapper is installed.\n");
match fs::metadata("/usr/blub/tedge_mapper") {
Ok(m) => println!("Trying /usr/blub/tedge_mapper :{:?} 🍌", m.file_type()),
Err(e) => println!("Failed /usr/blub/tedge_mapper :{:?} 🍌", e),
}
When run via sudo tedge connect c8y, I get:
Failed /usr/blub/tedge_mapper :Os { code: 13, kind: PermissionDenied, message: "Permission denied" } 🍌
What is different when running this code on its own compared to as part of tedge❓
I think I've found the reason. It's in
let _user_guard = user_manager.become_user(tedge_users::TEDGE_USER)?;
This boils down to
pub fn switch_user_group(uid: uid_t, gid: gid_t) -> io::Result<SwitchUserGuard> {
...
set_effective_gid(gid)?;
set_effective_uid(uid)?;
In other words, the euid of root is replaced by tedge's uid, the egid is replaced by tedge's gid.
Other supplementary groups of root are left untouched and stay.
Again, we're accessing `/mnt/data with its permissions as this:
$ ls -lhd /mnt/data/
drwxrwsr-- 8 root data 4.0K Feb 17 07:49 /mnt/data/
root as the owner is allowed to access this directory, also the group data.
root is not a member of data (no need for that), while user tedge is member of data.
But in switch_user_group, only the egid tedge is set, not the user tedge's other groups like data.
Thus the process will run as tedge with only the group tedge and is disallowed to access /mnt/data.
For testing purposes, I've put root into the group data. Then things work flawlessly.
🎯❓ So, how can we not only set the egid, but also the target user's supplementary groups?
Thank you for this detailed analysis. However, I have to understand what is the motivation to put the tedge_mapper binary in a directory with restricted access?
I'm not keen on the idea to add this complexity to the tedge command line tool. There is already too much in it! Rather than adding group handling features, I would prefer to remove hardcoded user handling features. I mean to have tedge connect prepare configuration files used by external tools taking the responsibility to run the services using the locally-defined settings.
Do I understand your reduction wish correctly if I rephrase it as follows?
tedgeshould write configuration files of itself and the various cloud connectors and mappers, but not start other processes
You've mentioned "external tools". Do you think of systemd and alike or of helper scripts which continue what tedge no longer does?
Regarding the user&group handling:
To my eyes, the current way is both too lax and too restrictive at the same time.
Too lax, because root doesn't drop all capabilities (supplementary groups won't be dropped) and thus leaks permissions.
Too restrictive, because the user tedge's supplementary groups are not added.
But yes, not messing with effective users and leaving that to "the outside" sounds plausible (as long as "the outside" knows, what to do ...)
Hi @hansdaniels,
I am interested to understand the reason why you move the executables to '/mnt/data/hmi/tedge/usr/bin/'. That would help to align about a good solution. Can you explain the motivation behind that?
To unblock you for the moment, a workaround might be to add user root to group data. Since switch_user_group() seems not to drop supplementary groups, group data should remain.
@hansdaniels What is the status of this issue on your side?
We no responses within 5 days, I will close this issue.
Hi @didier-wenzek,
did you change anything in this direction? If not, we still have the problem.
@cstoidner: The reason is, because our device has base system (/usr/ belongs to the base system) which should no modified.
Customer Apps have to be installed in /mnt/data.
Your reactivity makes things clear: this is important for you!
Since this ticket is opened, many things have changed but the issue is still here. As a summary:
- If the
tedgeuser can run thetedge_mappercommand only thanks to having thetedgeuser added to some group (here thetedgeuser be added to thedatagroup to be able the access the directory containing thetedge_mapperdirectory); - Then the
sudo tedge connect c8ycommand correctly connects the device to Cumulocity, but doesn't restart the mapper due to a failing check (tedge-mapperbeing not accessible is erroneously interpreted astedge-mapperis not installed). - The root cause is that the
tedge connectchanges the effective group id - loosing any access granted via groups to thetedge user. - There are several workarounds.
- Granting traversal access of
/mnt/data/to all users. - Adding
rootto thedatagroup (The fact that this works highlights the correctness of one of the comment: "This is to lax because root doesn't drop all capabilities"). - Restarting manually the mapper on reconnect.
- Granting traversal access of
- There are some quick but no convincing fixes.
- Do the
whichcheck asrootand nottedge. - Stop to change the group.
- Do the
- Long-term fixes are more involved.
- Make useless the need to restart the mapper on connect. This is my preferred solution and a work in that direction is on going see https://github.com/thin-edge/thin-edge.io/issues/1201. A follow up task will be to remove the culprit mapper restart call in the tedge command.
- Stop playing with effective user and group in the
tedgecommand. As highlighted in this thread, this is not really effective from a user perspective. The main idea was to create resources for different users (tedge,mosquitto, ... and others that have been deprecated meantime). The simpler would be to create these resources with explicit user, group and mod: see https://github.com/thin-edge/thin-edge.io/issues/1207. - Stop to have a
tedgecommand that try to do things from the inside (restarting mosquitto and the mapper) with extra complexity to configure the init system. This would be so simple to script this from the outside. However, this needs to be done after a better understanding of the base commands that thin-edge must provide.
As a conclusion. We will address this issue but with no rush. 1) by removing the necessity to restart the mapper on connect and 2) by removing the dance with effective users and groups.
@toewsar We have finally implement one of the 3 fixes: the second one labelled ii) The tedge command does no more switch back and forth from root and the tedge user. Your main issue should be fixed. Can you double check that? Thank you.
@didier-wenzek I check the version 0.7.3 and it worked. Thanks,