runc Why Runc do bind mount config.rootfs to config.rootfs while it create container ?

In rootfs_linux.go prepareRoot(), Runc bind mount config.Rootfs to config.Rootfs . Why this step is necessary? In my centos 8.2, while graphdriver is overlayfs2, it seem that this step can be skipped without any consequences.Can we optimize this step? https://github.com/opencontainers/runc/blob/51e607f2cdd8ccf7f39f29aab531d674e487b311/libcontainer/rootfs_linux.go#L785 Here is the code: func prepareRoot(config *configs.Config) error { flag := unix.MS_SLAVE | unix.MS_REC if config.RootPropagation != 0 { flag = config.RootPropagation } if err := unix.Mount("", "/", "", uintptr(flag), ""); err != nil { return err } if err := rootfsParentMountPrivate(config.Rootfs); err != nil { return err } return unix.Mount(config.Rootfs, config.Rootfs, "bind", unix.MS_BIND|unix.MS_REC, "") }

Mar 15 '22 11:03 Marshalzxy

I am not quite sure but maybe this mount is not needed if config.Rootfs is already a mount point.

If the above is true, we can make it optional. Problem is, detection of whether the path is a mount point is a costly operation itself, for kernels that lack openat2(2) support (that is, Linux < 5.6) detecting a bind mount requires parsing /proc/mountinfo.

What is your use case, and what is the benefit from making this mount optional?

Mar 16 '22 22:03 kolyshkin

Problem is, detection of whether the path is a mount point is a costly operation itself, for kernels that lack openat2(2) support (that is, Linux < 5.6) detecting a bind mount requires parsing /proc/mountinfo.

OTOH we already parse mountinfo anyway in rootfsParentMountPrivate...

Mar 16 '22 23:03 kolyshkin

The following patch can prevent the bind mount if it's already a mount point:

diff --git a/libcontainer/rootfs_linux.go b/libcontainer/rootfs_linux.go
index 5290a45e..c123547e 100644
--- a/libcontainer/rootfs_linux.go
+++ b/libcontainer/rootfs_linux.go
@@ -738,15 +738,18 @@ func getParentMount(rootfs string) (string, string, error) {
 	return mi[idx].Mountpoint, mi[idx].Optional, nil
 }
 
-// Make parent mount private if it was shared
-func rootfsParentMountPrivate(rootfs string) error {
+// Make parent mount private if it was shared. This also returns
+// a flag whether rootfs itself is a mount point.
+func rootfsParentMountPrivate(rootfs string) (bool, error) {
 	sharedMount := false
 
 	parentMount, optionalOpts, err := getParentMount(rootfs)
 	if err != nil {
-		return err
+		return false, err
 	}
 
+	isMount := parentMount == rootfs
+
 	optsSplit := strings.Split(optionalOpts, " ")
 	for _, opt := range optsSplit {
 		if strings.HasPrefix(opt, "shared:") {
@@ -760,10 +763,10 @@ func rootfsParentMountPrivate(rootfs string) error {
 	// shared. Secondly when we bind mount rootfs it will propagate to
 	// parent namespace and we don't want that to happen.
 	if sharedMount {
-		return mount("", parentMount, "", "", unix.MS_PRIVATE, "")
+		return isMount, mount("", parentMount, "", "", unix.MS_PRIVATE, "")
 	}
 
-	return nil
+	return isMount, nil
 }
 
 func prepareRoot(config *configs.Config) error {
@@ -778,10 +781,16 @@ func prepareRoot(config *configs.Config) error {
 	// Make parent mount private to make sure following bind mount does
 	// not propagate in other namespaces. Also it will help with kernel
 	// check pass in pivot_root. (IS_SHARED(new_mnt->mnt_parent))
-	if err := rootfsParentMountPrivate(config.Rootfs); err != nil {
+	isMount, err := rootfsParentMountPrivate(config.Rootfs)
+	if err != nil {
 		return err
 	}
 
+	if isMount {
+		// config.Rootfs is already a mount point.
+		return nil
+	}
+
 	return mount(config.Rootfs, config.Rootfs, "", "bind", unix.MS_BIND|unix.MS_REC, "")
 }

Feel free to test it and report your findings here.

Mar 16 '22 23:03 kolyshkin

I am not quite sure but maybe this mount is not needed if config.Rootfs is already a mount point.

If the above is true, we can make it optional. Problem is, detection of whether the path is a mount point is a costly operation itself, for kernels that lack openat2(2) support (that is, Linux < 5.6) detecting a bind mount requires parsing /proc/mountinfo.

What is your use case, and what is the benefit from making this mount optional?

Thanks to reply my issue, it is not my special use case .I read these codes of Runc preparing rootfs, and this duplicated bind mounted made me confused. So i think maybe we can do some optimize this step. @kolyshkin

Mar 17 '22 01:03 Marshalzxy

i think maybe we can do some optimize this step.

This is what the above patch does. I am not convinced though whether we should have it or not. Surely, we can omit this mount, and thus save a mount(2) syscall. OTOH this makes the code more complicated, and I am not quite sure it is worth the savings.

What do you think @Marshalzxy ?

Mar 30 '22 20:03 kolyshkin

i think maybe we can do some optimize this step.

This is what the above patch does. I am not convinced though whether we should have it or not. Surely, we can omit this mount, and thus save a mount(2) syscall. OTOH this makes the code more complicated, and I am not quite sure it is worth the savings.

What do you think @Marshalzxy ?

It's indeed more complicated. At lease we should add some comments above return mount(config.Rootfs, config.Rootfs, "", "bind", unix.MS_BIND|unix.MS_REC, "") To explaint under some circumstances this mount is redundant.

Mar 31 '22 06:03 Marshalzxy

It's indeed more complicated. At lease we should add some comments above return mount(config.Rootfs, config.Rootfs, "", "bind", unix.MS_BIND|unix.MS_REC, "") To explaint under some circumstances this mount is redundant.

I think I do have a comment, right before the return, which explains the case.

My question was different. Do you think that adding this (thus making the code code complicated) is worth the savings we get in return (skipped bind mount)? I am genuinely not sure about this, and that is why I'm asking.

Apr 01 '22 23:04 kolyshkin

It's indeed more complicated. At lease we should add some comments above return mount(config.Rootfs, config.Rootfs, "", "bind", unix.MS_BIND|unix.MS_REC, "") To explaint under some circumstances this mount is redundant.

I think I do have a comment, right before the return, which explains the case.

My question was different. Do you think that adding this (thus making the code code complicated) is worth the savings we get in return (skipped bind mount)? I am genuinely not sure about this, and that is why I'm asking.

After your patch, these codes are little more complicated.However it indeed saves hundreds of kernel instuctions and context switch when system call is called. So I do think we could accecpt your patch.

Apr 06 '22 02:04 Marshalzxy

I think setting slave mode here instead of using bind mode is not wanting to affect the host rootfs

Sep 06 '22 07:09 kamizjw

i have the same question.... it's confused

Feb 10 '23 10:02 anurnomeru

https://man7.org/linux/man-pages/man2/mount.2.html First it mounts / as slave with MS_REC, the rootfs mount becomes slave.
Second it bind mounts rootfs with MS_REC, all submounts under the rootfs subtree are also bind mounted. Maybe it help to ignore unbindable mountpoint?

Aug 10 '23 06:08 chenk008