Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use newer tarfile to handle bundles more safely #197

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

ekoyle
Copy link

@ekoyle ekoyle commented Jul 22, 2024

Use tarfile from python 3.12 to handle bundles in a safer way.

Ignore user permissions where possible, and don't copy unsafe permissions/files.

tarfile_fallback.py was copied from the latest python 3.8 (backported from python 3.12) -- if we aren't on a new enough version of python, use that instead of the built-in one.

Fixes #196

Copy link
Contributor

@bbannier bbannier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @ekoyle!

It would be great if you could add a new test validating the fix. This project uses BTest and tests are in testing/tests.

CI is currently failing due to lots of style issues checked in pre-commit hooks. To run this locally you would need to install pre-commit, e.g., via pipx install pre-commit; you can then run checks identical to CI with pre-commit run -a. I suggest below to not fork python-3.12's tarfile and instead outsource this to a dependency, so you shouldn't need to worry about style issues in zeekpkg/tarfile_fallback.py.

Copy link
Contributor

@bbannier bbannier Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not fork this into our codebase -- could we instead use a dependency like e.g., backports.tarfile (ideally: with a comment in pyproject.toml that we need it to fix XYZ until we bump to python-N.M.K)? If you could add a test for the issue this fixes we could validate that it works for us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like ci is installing python dependencies anywhere.

This brings up an issue with this approach: up until now, all of the dependencies needed to run zkg have been available from the OS package manager. This dependency won't be available for the zeek-zkg package.

Using a library from pypi may require users to find a way to manage a python virtualenv for zkg (recent versions of pip refuse to install packages globally, so users would need to use virtualenv or something like pipx, pipenv, etc).

@@ -69,37 +86,83 @@ def make_symlink(target_path, link_path, force=True):
raise error


def safe_tarfile_extractall(tfile, destdir):
"""Wrapper to tarfile.extractall(), checking for path traversal.
def zkg_tarfile_create(basedir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add PEP257 docstrings for the functions you are adding?

@ekoyle
Copy link
Author

ekoyle commented Jul 23, 2024

I'd like to double-check a few things before adding finishing touches on this:

  • What is the minimum python version we need to target?
    • tarfile was only backported as far as python 3.8
  • Is the intended permissions logic reasonable (still working on this)?
    • all files within the bundle tarball should be either mode 0o755 (rwxr-xr-x) or 0o644 (rw-r--r--)
    • owners within tarball should be root/root
    • permissions within the tarball should be largely ignored while unbundling, resulting in similar mode to above and owner/group of the user running zkg

@bbannier
Copy link
Contributor

bbannier commented Jul 23, 2024

I'd like to double-check a few things before adding finishing touches on this:

  • What is the minimum python version we need to target?
    • tarfile was only backported as far as python 3.8

We currently require >=python-3.9 (python-3.8 will EOL in 11/2024)

  • Is the intended permissions logic reasonable (still working on this)?
    • all files within the bundle tarball should be either mode 0o755 (rwxr-xr-x) or 0o644 (rw-r--r--)

AFAICT zkg should in principle support installing executable scripts so I'd think we'd want to preserve executable bits if present (but not force it for all installed files). Write we'd only need by the owner. I currently do not see that we would need to preserve setuid bits.

  • owners within tarball should be root/root

Owner should be the user installing the package. This is to support installing into a user path, e.g., I use a zkg installed out of band and have

$ zkg config
...
[paths]
state_dir = /Users/bbannier/.zkg
script_dir = /Users/bbannier/.zkg/script_dir
plugin_dir = /Users/bbannier/.zkg/plugin_dir
bin_dir = /Users/bbannier/.zkg/bin
...
  • permissions within the tarball should be largely ignored while unbundling, resulting in similar mode to above and owner/group of the user running zkg

Since we'd want to preserve at least some permission bits (see above) we should make sure that both install and subsequent bundle preserve these. Ideally we'd have a test validating that roundtrip.

@ekoyle
Copy link
Author

ekoyle commented Jul 24, 2024

Although I was able to get the permissions the way I intended during the extraction process, there can still be issues due to the umask when we build a package.

Would it be reasonable to have zkg emit an error or a warning when run with a umask more restrictive than 022, or maybe even set the umask before building? It seems like a package manager should be allowed to ignore the umask if it is going to cause problems, but I'm not sure if anyone would be bothered by such a change.

@bbannier
Copy link
Contributor

Would it be reasonable to have zkg emit an error or a warning when run with a umask more restrictive than 022, or maybe even set the umask before building? It seems like a package manager should be allowed to ignore the umask if it is going to cause problems, but I'm not sure if anyone would be bothered by such a change.

I am not exactly following 😅. I think the issue is exactly that zkg's active umask leaks into installed, bundled and unbundled files. Couldn't we preserve permissions on installed files, and pick a reasonable default for generated ones?

@ekoyle
Copy link
Author

ekoyle commented Jul 24, 2024

I am not exactly following 😅. I think the issue is exactly that zkg's active umask leaks into installed, bundled and unbundled files. Couldn't we preserve permissions on installed files, and pick a reasonable default for generated ones?

This PR resolves some issues with permissions leaking into and out of the tarball, however zkg is still inheriting the umask from the environment where it runs. If this umask is restrictive, then when zkg (or a subprocess) copies or creates files, they are created with restrictive permissions. This seems to be a major problem if there are compiled extensions in the bundle, as the build is going to copy and create files.

The simplest fix for this would be to call os.umask(0o022) early on in zkg, however I'm not sure whether that would deserve a separate PR or whether such a PR would be accepted.

Use the new `filter` option in tarfile's `extractall()` to mitigate more
issues that could arise from loading a malicious tar file.

Try to normalize permissions when creating the tar and only pay
attention to whether the user `x` bit is set (permissions should be
`0o755` if executable and `0o644` otherwise).

Omit the original username in the tarball and replace with root:root.

When unpacking, the same rules are applied on permission mode, and the
user is left as the user executing zkg.

Use [tarfile.py](https://github.com/python/cpython/raw/v3.8.19/Lib/tarfile.py)
from python 3.8.19 and rename to tarfile_fallback.py so that we can use
the newer features. Once our minimum python versions are greater than
3.12, 3.11.4, 3.10.2, 3.9.17, or 3.8.17, we can safely remove this file
as well as the conditional import in _util.py.
ekoyle and others added 5 commits July 24, 2024 14:05
Add a btest for the new permission handling. This is using output of ls
and tar. I'm not positive this will work with variants other than GNU.
If this doesn't work, I'm not sure how to accomplish this portably short
of writing scripts to print out the desired information.
@bbannier
Copy link
Contributor

bbannier commented Jul 25, 2024

@ekoyle I pushed added some fixups to make your test pass both with a GNU and a BSD stack and a minor vendoring cleanup. Please feel free to rebase aggressively.

zeekpkg/_util.py Outdated Show resolved Hide resolved
import types

import git
import semantic_version as semver

from .vendor import tarfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should continue to use tarfile from the Python distribution if it's modern enough. If anything changes in the distribution version, at least it would be captured.

Something like that might work.

import tarfile as builtin_tarfile
if hasattr(builtin_tarfile, "data_filter"):
   tarfile = builtin_tarfile
else
   from .vendor import tarfile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this vendoring all that much, but after staring a bit at the tarfile difference, think it's okay.

zeekpkg/_util.py Outdated Show resolved Hide resolved


def zkg_update_perms(new_attrs, member, extract):
# we are doing our own thing with `mode` here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the inverse of what was discussed in the ticket:

If a user creates a tar by hand or with custom tooling that contains a 0600 file and extracts it as zeek:zeek user on the target system, today it'll be a somewhat expected 0600. With this change the file will be world-readable on the target system.

Having a 0600 file with zkg/git may be practically difficult to achieve and I'm not sure there's an actual use case, but not sure putting a world-readable label on everything is great.

If we'd warn users about strict umask settings and the consequences and a possible opt-in to set laxer permissions within the bundle, maybe that would suffice, too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unless there is a really strong use-case against it, we should favor the solution that will work for everyone. I feel like the package manager should be in charge of the permissions rather than carrying over (often accidental) permissions from the system on which the bundle is created (more so if they are creating them by hand) -- especially since there are also unexpected interactions depending on the umask zkg runs with (the permissions of the unpacked files don't have any effect on compiled code, even with this PR). A package should be portable, and if the end-user needs to worry about permissions on both the bundling system and the unbundling system, I don't think the package is very portable.

What is the best practice for running zkg? It seems like it would be best for zkg to run as a user that is different from the user running zeek, as well as a non-root user. It would be great if we could mask permissions with say 0o027, but that would require zkg to know a group that is shared between the user unpacking the bundle and the user running zeek (or at least for end users to know to set the zkg users primary group to something the zeek user is a member of). Unless zkg is going to be opinionated on users and groups (which I don't think it should), I think it would be better to default to world-readable permissions.

If we want to provide a way for users to control permissions of files written by zkg, maybe it would be better to provide a cli/config option for umask, and default it to 0o022? Then, we could set that as the umask for the zkg process early on, and also mask the permissions we are calculating.

The only use-case I have come up with for wanting some files to have more restrictive permissions than others is if a user is including sensitive files in a bundle, which I hope they are not -- however even this use-case could be handled by using a more restrictive umask for all files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use-case I have come up with for wanting some files to have more restrictive permissions than others is if a user is including sensitive files in a bundle, which I hope they are not

Yeah, that was the only thought, and can also see that it's a "don't do that".

I think I'm still leaning on the side of putting the portable/appropriate permission bits into the bundle/tar, but during extraction at least not make them more permissible. Maybe I'm too timid and someone else should chime in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to separate that piece into another ticket/PR?

Also, since the process umask is honored while building (for files created during the build step), should we also honor the process umask while unbundling to be more consistent? I'm not really sure what the best approach is there, but I really don't like the inconsistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to separate that piece into another ticket/PR?

I'd wait and see if @ckreibich or others opine. As said, maybe I'm just too timid and making everything 0644/0755 is okay.

should we also honor the process umask while unbundling to be more consistent?

Hmm, with the current state of the PR, I'd even say you could open-up/force the umask to 022 before building so that compiled files are world readable. In your test, the .so file produced has 0750, which wouldn't be usable for non-root or non-zkg-user.

-rwxr-x--- root/root     31448 2024-07-25 16:44 Demo_Rot13/lib/Demo-Rot13.linux-x86_64.so

And even with the stricter permission handling, maybe a warning is appropriate if umask is found to be too strict before building.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit (1ff7c78) should set the umask on build based on a new zkg_umask attribute on the Manager class, as well as applying that umask to files as they are extracted from a bundle. If we want to keep it, I think we could make zkg_umask user-configurable. It is currently hardcoded to 0o022, but it seems to behave as I expected when changing that mask.

ekoyle and others added 3 commits July 25, 2024 09:29
Remove the default for dest_path is it is not optional.

Co-authored-by: Arne Welzel <[email protected]>
Add docstrings and update zkg_update_perms to return a dict rather than
accept and modify a dict as the first argument.
Add zkg_umask to Manager and apply that mask when unbundling or building.
@awelzel
Copy link
Contributor

awelzel commented Jul 30, 2024

Maybe I'm too timid and someone else should chime in.

@rsmmr - do you have opinions on umask and permissions of files within bundles and those created during the build step and zkg's role?

There's much comments here, but in short: Any strong feelings/reactions about making files/directories 0644 or 0755 regardless of their initial permission bits during a zkg bundle / zkg unbundle command?

@rsmmr
Copy link
Member

rsmmr commented Jul 30, 2024

I haven't followed the discussion in detail but generally I agree that a packager manager should be able to control its permissions and set them the way it wants things to work. That said, I think it would be good if a bundle/unbundle cycle created the same layout as corresponding zkg install runs, and iirc we don't change permissions there either. So my gut feeling would be to decide which permissions we want, and then enforce them consistently. Plus, we could give zkg an option to override if somebody needs something else (like a custom umask setting or such; and maybe we then simply get our defaults by presetting that to the desired works-for-almost-everybody value).

Set the umask during `_stage()` in the install process to hopefully give
similar results to bundling and unbundling. Needs btests to verify. We
may need a custom copy function if this doesn't result in the same
behavior as bundling/unbundling.

Looking at this, it might be better to make changes to Manager._stage()
to enforce more consistent permissions when copying.
@ekoyle
Copy link
Author

ekoyle commented Jul 31, 2024

I made some changes to try to keep the permissions similar during install, but I think we may want to replace shutil.copytree() in _util.copy_over_tree() with something that enforces the same permissions scheme as the bundle/unbundle.

Still needs a btest for the install as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

umask affects permissions in zkg bundles
4 participants