Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount volumes before copying into a container #24655

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mheon
Copy link
Member

@mheon mheon commented Nov 22, 2024

This solves several problems with copying into volumes on a container that is not running.

The first, and most obvious, is that we were previously entirely unable to copy into a volume that required mounting - like image volumes, volume plugins, and volumes that specified mount options.

The second is that this fixed several permissions and content issues with a fresh volume and a container that has not been run before. A copy-up will not have occurred, so permissions on the volume root will not have been set and content will not have been copied into the volume.

If the container is running, this is very low cost - we maintain a mount counter for named volumes, so it's just an increment in the DB if the volume actually needs mounting, and a no-op if it doesn't.

Does this PR introduce a user-facing change?

Fixed a bug where volumes would have the wrong permissions if `podman cp` was used to copy into a fresh volume in a container that had never been started.
Fixed a bug where using `podman cp` to copy into a named volume requiring a mount (image volumes, volumes backed by a volume plugin, or other volumes with options) would fail on stopped containers.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 22, 2024
@mheon
Copy link
Member Author

mheon commented Nov 22, 2024

Still needs a test, putting this up so I can have some folks look at it.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a linked Jira or github issue or why are you looking into this?

Comment on lines 68 to 74
defer func() {
vol.lock.Lock()
if err := vol.unmount(false); err != nil {
logrus.Errorf("Unmounting volume %s after container %s copy: %v", vol.Name(), c.ID(), err)
}
vol.lock.Unlock()
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right, this function here isn't doing any work it is returning another function so defer runs before the copy, you must add that to unmount() in the if else case above.

@mheon
Copy link
Member Author

mheon commented Nov 22, 2024

/hold

Requires extensive rework, DO NOT MERGE until I repush

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
@mheon mheon force-pushed the fix_volume_perms_cp branch from 39a7445 to a34abed Compare November 22, 2024 20:35
@mheon
Copy link
Member Author

mheon commented Nov 22, 2024

/hold cancel
OK, this now works. I hate how gross it is, but it does work.
Still needs a test, will add Monday

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2024
Copy link

Cockpit tests failed for commit a34abed. @martinpitt, @jelly, @mvollmer please check.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire thing looks extremely racy, during the copy the container is not locked so it is possible for us to unmount the rootfs while to copy happens I think and do other bad things. The volume logic also strikes me as of. But I guess most of it are pre existing issues so not really relevant to your permissions/volume mounting fixes.

Comment on lines 121 to 190
// This must be the first cleanup function so it fires before volume unmounts happen.
cleanupFuncs = append([]func(){func() {
// This is a gross hack to ensure correct permissions
// on a volume that was copied into that needed, but did
// not receive, a copy-up.
volume.lock.Lock()

if err := volume.update(); err != nil {
logrus.Errorf("Unable to update volume %s status: %v", volume.Name(), err)
volume.lock.Unlock()
return
}

if volume.state.NeedsCopyUp && volume.state.NeedsChown {
volume.state.NeedsCopyUp = false
volume.state.CopiedUp = true
if err := volume.save(); err != nil {
logrus.Errorf("Unable to save volume %s state: %v", volume.Name(), err)
volume.lock.Unlock()
return
}

volume.lock.Unlock()

for _, namedVol := range c.config.NamedVolumes {
if namedVol.Name == volume.Name() {
if err := c.fixVolumePermissions(namedVol); err != nil {
logrus.Errorf("Unable to fix volume %s permissions: %v", volume.Name(), err)
}
return
}
}
}
}}, cleanupFuncs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part at all. The comment is not clear to me.

I really dislike how this locks/unlocks twice the same volume for no apparent reason other than fixVolumePermissions() also locks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically: fixVolumePermissions chowns the base of the volume according to where it is mounted. However, to match Docker's somewhat convoluted copy-up behavior, it only does this when the volume is not empty or this is the first time we populated the volume - a copy-up. Copy-up overrides the checks that halt the chown if a volume has contents. This is basically treating the podman cp into the volume as a copy-up event if one didn't happen to ensure we chown the volume despite it now being populated. It has the convenient side effect of avoiding any subsequent attempts to copy-up into the volume if it's mounted again.

I'll put this in a comment in the code but leaving it here so I don't forget, the logic here is somewhat torturous.

Comment on lines 97 to 99
// If we are not running, generate an OCI spec.
// Then use that to fix permissions on all the named volumes.
// Has to be *after* everything is mounted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hire it is also not clear how running generateSpec() is fixing the permissions . Personally I would really much rather split out exactly what is needed here into a separate function from generateSpec() to not do more work than needed and then it should also be very clear from looking at such function which part is relevant.


cleanupFuncs = append(cleanupFuncs, func() {
vol.lock.Lock()
if err := vol.unmount(false); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is if anyone kills the copy process before we do cleanup we leak all volume mounts forever as the reference count (v.state.MountCount) will not be decremented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know how we fix that. I could inhibit shutdown during the cleanup functions running, but that doesn't help us if someone kills the copy before then. I could inhibit shutdown for the entirety of the copy, but that would really violate user expectations.

We almost need a cleanup process for the volume mounts, but how would we signal it that it should run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I really don't know here... If it is killed there is no way to run such process. The only point would be it would allow user to manual recover but then again how would they know how often to run this command. They could run it twice and decrement the counter more than it was incremented which would also cause other issues.

I guess once option would be to register a new shutdown handler that runs the cleanup for SIGINT/TERM which I think would help already for the majority of cases. That is not straight forward either as we would remove the handler again once we are done.

Copy link

Cockpit tests failed for commit 31d6526. @martinpitt, @jelly, @mvollmer please check.

Comment on lines 71 to 74
if err := c.syncContainer(); err != nil {
logrus.Errorf("Unable to sync container %s state: %v", c.ID(), err)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the sync happen directly after the lock/unlock so we know the state is sane for cleanupFunc() and c.unmount()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unmount doesn't rely on it but the cleanup functions might, I'll move it

Comment on lines 117 to 132
cleanupFuncs = append(cleanupFuncs, func() {
_ = shutdown.Unregister(fmt.Sprintf("volume unmount %s", vol.Name()))

if err := volUnmountFunc(); err != nil {
logrus.Errorf("Unmounting container %s volume %s: %v", c.ID(), vol.Name(), err)
}
})

if err := shutdown.Register(fmt.Sprintf("volume unmount %s", vol.Name()), func(_ os.Signal) error {
return volUnmountFunc()
}); err != nil && !errors.Is(err, shutdown.ErrHandlerExists) {
return nil, fmt.Errorf("adding shutdown handler for volume %s unmount: %w", vol.Name(), err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split out fmt.Sprintf("volume unmount %s", vol.Name()) in the outer scope and the use the var in both callers. That way we do no not have to worry about typos or chnages in only one place.

Second do we need to some random uid in the handler name as well? It does not matter for a local podman as there can only be one command running at the same time but with the service you could in theory get two copies into the same voluem at the same time and the if the service gets a SIGINT/TERM we would need to unmount twice to keep the mount count correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random ID probably a good idea, I'll add it.

ctrCreate.WaitWithDefaultTimeout()
Expect(ctrCreate).To(ExitCleanly())

cp := podmanTest.Podman([]string{"cp", "/etc/hosts", fmt.Sprintf("%s:%s", ctrName, ctrVolPath)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not assume the hosts /etc/hosts exists or that it will be owned by root. Just create your own dummy file on the host in the test.
Also doesn't this cause a inconsistent behavior for rootless as uid 0 on the host is not mapped in the userns

Copy link

Cockpit tests failed for commit 0fe846e. @martinpitt, @jelly, @mvollmer please check.

@mheon mheon force-pushed the fix_volume_perms_cp branch 2 times, most recently from 7acd44e to 638e629 Compare November 26, 2024 16:15
Copy link

Cockpit tests failed for commit 36b7462. @martinpitt, @jelly, @mvollmer please check.

Copy link

Cockpit tests failed for commit 36f4765. @martinpitt, @jelly, @mvollmer please check.

@martinpitt
Copy link
Contributor

Wrt. failed cockpit rawhide test: I reported the criu/kernel regression to https://bugzilla.redhat.com/show_bug.cgi?id=2328985 and silenced it in our tests in cockpit-project/bots#7146 . From now on, this failure will be ignored, i.e. tests should go back to green. Please either retry or re-push.

@mheon mheon force-pushed the fix_volume_perms_cp branch 2 times, most recently from 8eba026 to 7de9b6d Compare November 26, 2024 20:18
@mheon
Copy link
Member Author

mheon commented Nov 26, 2024

Realization: the unmount-on-shutdown code here is inadequate. The copy will likely be ongoing, so the unmount will fail because the mount is busy. Adding a context to the copy itself could solve this but I don't know if that's worth it.

@Luap99
Copy link
Member

Luap99 commented Nov 27, 2024

Realization: the unmount-on-shutdown code here is inadequate. The copy will likely be ongoing, so the unmount will fail because the mount is busy. Adding a context to the copy itself could solve this but I don't know if that's worth it.

I think for API usage it would make a lot of sense to cancel the copy. But given buildah copier.Put/Get which are used by that code to copy do not have a context that would require changes there first which is largly out of scope for this fix of course.

@Luap99
Copy link
Member

Luap99 commented Nov 27, 2024

Also you have to rebase to fix CI. (As always please rebase on each push to avoid merging on old base that might cause conflicts)

mheon and others added 2 commits November 27, 2024 08:09
This reverts commit 5de7b7c.

We now require the Unregister shutdown handler function for
handling unmounting named volumes after `podman cp` into a
stopped container.

Signed-off-by: Matt Heon <[email protected]>
This solves several problems with copying into volumes on a
container that is not running.

The first, and most obvious, is that we were previously entirely
unable to copy into a volume that required mounting - like
image volumes, volume plugins, and volumes that specified mount
options.

The second is that this fixed several permissions and content
issues with a fresh volume and a container that has not been run
before. A copy-up will not have occurred, so permissions on the
volume root will not have been set and content will not have been
copied into the volume.

If the container is running, this is very low cost - we maintain
a mount counter for named volumes, so it's just an increment in
the DB if the volume actually needs mounting, and a no-op if it
doesn't.

Unfortunately, we also have to fix permissions, and that is
rather more complicated. This involves an ugly set of manual
edits to the volume state to ensure that the permissions fixes
actually worked, as the code was never meant to be used in this
way. It's really ugly, but necessary to reach full Docker
compatibility.

Fixes containers#24405

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the fix_volume_perms_cp branch from 7de9b6d to e66b788 Compare November 27, 2024 13:10
@mheon
Copy link
Member Author

mheon commented Dec 10, 2024

I think this is ready. We can handle context work (which seems rather substantial) in a followon PR.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Dec 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants