Skip to content

Commit

Permalink
[1.1] runc run: fix mount leak
Browse files Browse the repository at this point in the history
When preparing to mount container root, we need to make its parent mount
private (i.e. disable propagation), otherwise the new in-container
mounts are leaked to the host.

To find a parent mount, we use to read mountinfo and find the longest
entry which can be a parent of the container root directory.

Unfortunately, due to kernel bug in all Linux kernels older than v5.8
(see [1], [2]), sometimes mountinfo can't be read in its entirety. In
this case, getParentMount may occasionally return a wrong parent mount.

As a result, we do not change the mount propagation to private, and
container mounts are leaked.

Alas, we can not fix the kernel, and reading mountinfo a few times to
ensure its consistency (like it's done in, say, Kubernetes) does not
look like a good solution for performance reasons.

Fortunately, we don't need mountinfo. Let's just traverse the directory
tree, trying to remount it private until we find a mount point (any
error other than EINVAL means we just found it).

Fixes issue 2404.

[1]: https://github.com/kolyshkin/procfs-test
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 13a6f56)
Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Oct 3, 2024
1 parent a4cebd3 commit 65aa700
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 46 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased 1.1.z]

### Fixed

* On a system with older kernel, reading `/proc/self/mountinfo` may skip some
entries, as a consequence runc may not properly set mount propagation,
causing container mounts leak onto the host mount namespace. (#2404, #4425)

## [1.1.14] - 2024-09-03

> 年を取っていいことは、驚かなくなることね。
Expand Down
68 changes: 22 additions & 46 deletions libcontainer/rootfs_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -801,54 +801,33 @@ func mknodDevice(dest string, node *devices.Device) error {
return os.Chown(dest, int(node.Uid), int(node.Gid))
}

// Get the parent mount point of directory passed in as argument. Also return
// optional fields.
func getParentMount(rootfs string) (string, string, error) {
mi, err := mountinfo.GetMounts(mountinfo.ParentsFilter(rootfs))
if err != nil {
return "", "", err
}
if len(mi) < 1 {
return "", "", fmt.Errorf("could not find parent mount of %s", rootfs)
}

// find the longest mount point
var idx, maxlen int
for i := range mi {
if len(mi[i].Mountpoint) > maxlen {
maxlen = len(mi[i].Mountpoint)
idx = i
// rootfsParentMountPrivate ensures rootfs parent mount is private.
// This is needed for two reasons:
// - pivot_root() will fail if parent mount is shared;
// - when we bind mount rootfs, if its parent is not private, the new mount
// will propagate (leak!) to parent namespace and we don't want that.
func rootfsParentMountPrivate(path string) error {
var err error
// Assuming path is absolute and clean (this is checked in
// libcontainer/validate). Any error other than EINVAL means we failed,
// and EINVAL means this is not a mount point, so traverse up until we
// find one.
for {
err = unix.Mount("", path, "", unix.MS_PRIVATE, "")
if err == nil {
return nil
}
}
return mi[idx].Mountpoint, mi[idx].Optional, nil
}

// Make parent mount private if it was shared
func rootfsParentMountPrivate(rootfs string) error {
sharedMount := false

parentMount, optionalOpts, err := getParentMount(rootfs)
if err != nil {
return err
}

optsSplit := strings.Split(optionalOpts, " ")
for _, opt := range optsSplit {
if strings.HasPrefix(opt, "shared:") {
sharedMount = true
if err != unix.EINVAL || path == "/" { //nolint:errorlint // unix errors are bare
break
}
path = filepath.Dir(path)
}

// Make parent mount PRIVATE if it was shared. It is needed for two
// reasons. First of all pivot_root() will fail if parent mount is
// shared. Secondly when we bind mount rootfs it will propagate to
// parent namespace and we don't want that to happen.
if sharedMount {
return mount("", parentMount, "", "", unix.MS_PRIVATE, "")
return &mountError{
op: "remount-private",
target: path,
flags: unix.MS_PRIVATE,
err: err,
}

return nil
}

func prepareRoot(config *configs.Config) error {
Expand All @@ -860,9 +839,6 @@ func prepareRoot(config *configs.Config) error {
return err
}

// Make parent mount private to make sure following bind mount does
// not propagate in other namespaces. Also it will help with kernel
// check pass in pivot_root. (IS_SHARED(new_mnt->mnt_parent))
if err := rootfsParentMountPrivate(config.Rootfs); err != nil {
return err
}
Expand Down

0 comments on commit 65aa700

Please sign in to comment.