aurae icon indicating copy to clipboard operation
aurae copied to clipboard

Setns vs Clone(3)

Open krisnova opened this issue 3 years ago • 8 comments

All Aurae cells should setup the namespaces by scheduling a process immediately. We believe this will be the nested auraed.

Once the new namespaces have been "cloned" we can track their IDs.

All executables should be using setns() systemcall and not calling clone(3)() themselves. We should be entering already existing namespaces such that all executables in a cell share the same namespace and namespace ids.

krisnova avatar Dec 27 '22 21:12 krisnova

if an executable is started directly without starting a Cell, does it still go into a namespace or is it kept in the same namespace as the root auraed?

dmah42 avatar Dec 27 '22 21:12 dmah42

Some context based on #194

The aurae-executables crate now has an Executable and a NestedAurae struct.

NestedAurae uses clone3 with the proper flags for unsharing the namespaces. In the context of the child we then start a std's Command using the exec function (which uses execvp under the hood) which does not do a fork of any kind.

Executable is back to a wrapper around std's Command and calls spawn. spawn will make 1 of 3 calls (in priority order): posix_spawn, clone3, fork. My understanding is that nothing is unshared in any of those cases, so the child, I think, should share the same namespaces as the calling process. spawn then ends with calling execvp.

In both cases, prior to execvp we can run a pre_exec and Command will wire up stdio for us, which is nice.

future-highway avatar Dec 28 '22 00:12 future-highway

if an executable is started directly without starting a Cell

I don't believe this is possible. Maybe it is now with #194? I think a cell must be referenced from within the executable request. I don't think you can pass an empty string or nil value for cell name?

krisnova avatar Dec 28 '22 04:12 krisnova

So given two auraed instances:

  1. A root auraed running as the true pid 1
  2. a nested auraed running as a nested pid N, and a nested pid 1 from the perspective of the new pid namespace

Consider when we schedule a nested auraed we should call one of the clone like functions which will create the new namespace ID which can be located from /proc/[pid]/ns/[ns] or using bash:

readlink /proc/self/ns*
readlink /proc/self/ns/cgroup 
readlink /proc/self/ns/ipc
readlink /proc/self/ns/mnt
readlink /proc/self/ns/nt
readlink /proc/self/ns/pid
readlink /proc/self/ns/time
readlink /proc/self/ns/user
readlink /proc/self/ns/uts

readlink /proc/self/ns/pid_for_children
readlink /proc/self/ns/time_for_children

As we run executables with the nested auraed they should all share the same namespaces which is done by passing the namespace id (EG: net:[4026531835]) to the setns() system call.

The easiest way to do that is just call a plain old executable with the nested auraed and it will inherit whatever namespaces the nested auraed has, which begs the question of what @dmah42 mentioned about scheduling an executable without a cell. This also paves way for the question of what about double nested namespaces recursively and so on.

krisnova avatar Dec 28 '22 05:12 krisnova

if an executable is started directly without starting a Cell

I don't believe this is possible. Maybe it is now with #194? I think a cell must be referenced from within the executable request. I don't think you can pass an empty string or nil value for cell name?

will we not have an ExecutableService that "just runs" an executable? i thought we planned that for the API alongside pods and cells.

dmah42 avatar Dec 28 '22 08:12 dmah42

Some context based on #194

The aurae-executables crate now has an Executable and a NestedAurae struct.

i don't understand why these are coupled into the same crate. they should be used in different circumstances, as i understand it, and those different contexts would map to (i think?) services at the high level.

dmah42 avatar Dec 28 '22 08:12 dmah42

The aurae-executables crate now has an Executable and a NestedAurae struct.

i don't understand why these are coupled into the same crate. they should be used in different circumstances, as i understand it, and those different contexts would map to (i think?) services at the high level.

With the latest changes, they are coupled in the same module, so I think your question still stands.

At this point, Executable is a custom wrapper around std's Command and Process (also in the module, but based on Process in std, which is private, so we can't use it) when started.

NestedAuraed is a little more careful of how it uses Command so that we can control the clone behavior, and then also contains a Process when started.

So, they are both just custom ways to start and control a Process.

future-highway avatar Dec 28 '22 14:12 future-highway

I think it is appropriate to close this @krisnova. Our current setup is:

  1. Use clone3 to start a nested auraed process with the appropriate namespaces unshared
  2. Create a cgroup
  3. Move the nested auraed process into the cgroup
  4. Wait for a start request
  5. On start request, proxy the request to the nested auraed process
  6. In the nested auraed process, use a tokio process (which wraps std's process) to spawn an executable. Std's implementation can use multiple approaches, but they never unshare a namespace, so the namespaces will match those of the nested auraed process.

Steps to confirm:

Terminal 1:

make auraed auraed-start #install and start auraed

Terminal 2

aer cell allocate ae-1 --cell-isolate-process --cell-isolate-network #allocate an isolated nested auraed
aer cell start ae-1 sleep-100 -c "sleep 100" # start an exe in the cell

Terminal 3

sudo -E nsenter -a -t 477065 # enter all the namespaces of the nested auraed. The pid is taken from the output of auraed in terminal 1
ls -Li /proc/self/ns/* # list the namespaces of the current process process, which is in the namespaces of the nested auraed
nsenter -a -t 27 #enter all the namespaces of the exe. The pid is taken from the response in terminal 2
ls -Li /proc/self/ns/* # list the namespaces of the current process process, which is in the namespaces of exe

Compare the outputs from the two ls -Li commands. Example:

root@ubuntu-linux-22-04-desktop:/# nsenter -a -t 477065
root@ae-1:/# ls -Li /proc/self/ns/*
4026532392 /proc/self/ns/cgroup  4026532391 /proc/self/ns/pid                4026531837 /proc/self/ns/user
4026532390 /proc/self/ns/ipc     4026532391 /proc/self/ns/pid_for_children   4026532389 /proc/self/ns/uts
4026532388 /proc/self/ns/mnt     4026531834 /proc/self/ns/time
4026532393 /proc/self/ns/net     4026531834 /proc/self/ns/time_for_children
root@ae-1:/# nsenter -a -t 27
root@ae-1:/# ls -Li /proc/self/ns/*
4026532392 /proc/self/ns/cgroup  4026532391 /proc/self/ns/pid                4026531837 /proc/self/ns/user
4026532390 /proc/self/ns/ipc     4026532391 /proc/self/ns/pid_for_children   4026532389 /proc/self/ns/uts
4026532388 /proc/self/ns/mnt     4026531834 /proc/self/ns/time
4026532393 /proc/self/ns/net     4026531834 /proc/self/ns/time_for_children

future-highway avatar Feb 04 '23 17:02 future-highway

works for me :)

dmah42 avatar Jun 21 '24 14:06 dmah42