Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative function storage mounts are ignored #1151

Open
ashlineldridge opened this issue Oct 26, 2020 · 10 comments
Open

Declarative function storage mounts are ignored #1151

ashlineldridge opened this issue Oct 26, 2020 · 10 comments
Labels
area/hydrate p1 triaged Issue has been triaged by adding an `area/` label

Comments

@ashlineldridge
Copy link
Contributor

Re-raising kubernetes-sigs/kustomize#3141 here as suggested.

It seems like a decision has been made to make KRM function storage mounts imperative-only because as mentioned by @Shell32-Natsu:

That's intended.In most of the cases the KRM function config file author is different from the user. It makes no sense to add a path in user's system when the author is creating a function config. So we have decided to make the mount purely imperative.

In my experience, one of the main benefits of KRM functions is the ability for a package consumer to compose functions and apply them to pulled packages. Kptfiles allow you to declaratively specify functions that get applied during a sync operation (here) which makes it possible to apply a series of transformation and validation functions to packages as they are synced from upstream. Making certain properties of KRM functions imperative-only feels like it goes against the direction of the recent declarative setters proposal.

@Shell32-Natsu
Copy link
Contributor

@frankfarzan

@mortent mortent added area/hydrate p1 triaged Issue has been triaged by adding an `area/` label labels Nov 6, 2020
@aodinokov
Copy link

aodinokov commented Nov 26, 2020

I agree with that:

That's intended.In most of the cases the KRM function config file author is different from the user. It makes no sense to add a path in user's system when the author is creating a function config. So we have decided to make the mount purely imperative.

I would also explicitly mention the 'security-related' reasoning: we don't want the author of the function to set the absolute path in the user system that would be mounted to the function container. Just imagine that src is set to '/' - that will allow function that works as a Trojan horse to steal some data from the user machine. User isn't supposed to check all function declaration before running to make sure it's 'safe'.

BUT
here is an idea of a 'common-ground':
If the function declaration meets the certain condition - still it may be worth to allow using mounts inside fn declaration, something like :

apiVersion: kpt.example.com/v1alpha1
kind: FunctionConfig
metadata:
  name: example-function
  annotations:
    config.kubernetes.io/function: |
      container:
        image: example:latest
        mounts:
        - type: bind
          src: . # <=== allow only relative paths!!!!
          dst: /src

The condition for this is: if src is relative.
It can be relative to some directory (different possible options):

  1. current working dir (from which kpt fn run is ran)
  2. the dir that is used as a path in kpt fn run <path> (may not work for kpt fn run --fn-path <path>)
  3. introduce the additional parameter something like --mountSrcRoot <path>, that will be added to the beginning of src. Also need to make sure that the resulting src is inside the mountSrcRoot (for safety reasons).

Why it may be useful:
it may be used for certain number of generator functions that use some set of files as a source for the generated yamls:
e.g. helm-tempater, kpt fn source, kpt fn sink, kustomize build and similar tools (I know that kustomize may be eventually integrated with kpt, but there may be lots of other external tools).

This would allow to use mount in the kpt pipeline only for the functions that really need mounts rather than to set a 'common' mount-list for all functions in the pipeline, even though only 1 really uses it.

I think kpt sync triggered functions concern mentioned in kubernetes-sigs/kustomize#3141 (comment) may also be resolved potentially if we select current working directory, or set relative to Kptfile mountSrcRoot.

cc: @Shell32-Natsu

@aodinokov
Copy link

aodinokov commented Dec 8, 2020

Just as an example of the previous comment I've drafted the initial commit representing how it should work: kubernetes-sigs/kustomize#3336

and the more I'm thinking about that the more I like that relative path idea:
that for instance will allow krm-functions and kustomize plugins to get access to the local yaml files that may be mounted inside the fn configuration, that basically may allow to specify in kustomize plugins paths to that file and don't put everything into the fn configuration.

@mikebz
Copy link
Contributor

mikebz commented Dec 10, 2020

@aodinokov question about your use case - are you trying to do in place transformation of the yaml files in the package? Are you trying to read/write some other files that happen to be in the same directory. I am curious what the higher order goal is beyond having access to the package directory and files within.

@aodinokov
Copy link

@mikebz my use-case was "how to provide krm-function I use as generator with some additional files (not necessarily yamls) in the same directory or subdirectory". e.g. helm with some values.yaml to override values.

Here is the example. And here is its config.
This example works with the proposed. To be clear it is not about kustomize/kpt fn run. it's about fn-based plugins in kustomize.

The part about kpt source and kpt sync was more like a logical consequence of the feature implementation, but I don't have any usecases for that.
But part about kpt source reminded me about kubernetes-sigs/kustomize#3204 :

The resources: field acts like a generator. It "generates" resource YAML for
inclusion and possible transformation by reading file system paths or URLs.
If the path is a plain file, its contents are include in the build input stream. If the path leads to
a directory containing a kustomization.yaml file, a recursive kustomize build is performed,
and the result is included in the build input stream.

One can easily write a generator plugin that took a list of file paths as its configuration to
replace the behavior of the builtin resources: field. E.g a one-line bash script

cat a*.yaml
would almost work (it's missing triple hyphen doc delimiters) as a generator including all
yaml files starting with the letter a.

With the proposal from kubernetes-sigs/kustomize#3336 the part with yaml files as resources becomes possible even though it's a bit artificial.

If I try to generalize this - generator-like krm functions abilities will be extended by this feature since in addition to setting the values in their config it will be possible to link some external files.

@frankfarzan
Copy link
Contributor

frankfarzan commented Dec 11, 2020

As pointed out, there are security concerns, and to lesser extend portability concerns here. It’s desirable to be able to run functions hermetically without any privileges; preventing access to the host filesystem and networking (Talking to network also introduces non-determinism which is a separate discussion) . KRM functions are packaged and executed as container containers. By default, functions run as user nobody without any privileges such as access to host network or host volume.
The security consideration is especially relevant when running a pipeline of functions in the transitive closure of package. In the general case, the end-user did not author the packages or the functions, so they cannot be trusted. On the other hand, when running a function imperatively, the end-user explicitly chooses to run that program.

In the ideal world, any file required by the function is either:

  • Baked into the container image (This can be any opaque file format).
  • Passed in as KRM resources as either input or function config (ResourceList.items and ResourceList.functionConfig respectively)

We're still exploring and gathering concrete use cases for functions that require a non-KRM resource from the host file system, and whether they can be formulated in a better way. There are security mitigations that can be done (file being local to the directory containing the function config is one such option). A concrete use case where this comes up is a Helm inflator function where the function depends on a Helm chart which is not a KRM resource. There are different ways to formulate this:

  1. Hermetic docker images containing the Helm chart. e.g. gcr.io/kpt-functions/redis-chart:v1. This has pros (self-contained image with no end-user configuration) and cons (Need to build the images for all Helm charts needed).
  2. Function downloads a chart from network
  3. Function mounts a local chart from the host.

@mgoltzsche
Copy link

mgoltzsche commented May 28, 2021

A concrete use case for this is khelm.
It requires to provide a local Helm chart (including Helm templates that have the .yaml extension but don't contain valid yaml) as input into the function container. This feature becomes even more important since function pipelines can be declared within a Kptfile.
(Also a khelm function requires network connectivity when users use remote charts.)

Ideally as a user I don't want to bother with mounts at all but just be able to specify local file paths as function parameters, relative to the Kptfile or function.yaml (also see #1531).
As described in #902 I think this could be achieved by

  • wrapping each non-yaml source file in a resource of a special kind (that needs to be defined for this purpose)
  • or mounting the source directory to a fixed location within function containers.

While I prefer the latter it comes with the problem that the Helm chart directory usually needs to be maintained outside of the kpt directory since the invalid yaml within the templates makes kpt fail to read the function input otherwise.
For this to work in a fully declarative and secure way the function's file system access could be restricted to the git repository. Maybe mount the git repository contents into each function and provide the path of the Kptfile/function.yaml (that declares the function) to the function so that it can resolve relative local file paths safely?

@frankfarzan
Copy link
Contributor

frankfarzan commented Jul 9, 2021

This is considered part of privileged execution which is only allowed when running functions imperatively in kpt 1.0. This topic is discussed here:

https://kpt.dev/book/04-using-functions/02-imperative-function-execution?id=privileged-execution

We want to avoid mounting of any kind for declarative execution. Most use cases we've seen can be formulated as running a priviledged function once instead of repeatedly as part of the pipeline. e.g. Expand a Helm chart into a directory with eval whenever you want to update to a new version of the chart, instead of on every invocation of render.

Your suggestion of wrapping non-yaml in a KRM (e.g. a FileBlob resource kind) automatically is interesting and worth exploring more.

Current proposal for supporting non-KRM files is for functions to define such wrappers and having a pre-processor function to inject file content into the wrapper. See #2350
This is currently not targeting binaries blobs but strings such as Rego or Starlark code. We can optionally support binary blobs by base64 encoding the file.

@mgoltzsche
Copy link

mgoltzsche commented Jul 11, 2021

Most use cases we've seen can be formulated as running a priviledged function once instead of repeatedly as part of the pipeline.

That's true and still allows declarative usage of such a kpt package as a dependency for others. However unfortunately that means that parts of the kpt workflow still have to be done imperatively although I'd like to declare them as well.
On the other hand, while playing around with kpt, khelm and kustomize (as a kpt function), I noticed that turnarounds can take longer if you have to run the whole pipeline after every change. However an alternative would be to split it into several imperative commands that would be called explicitly depending on the changes the developer just did - this is also not great though.
Having kpt pipelines cache pipeline runs depending on the input file changes could improve this - but may not be worth implementing it for the few edgy experimental use cases I had this problem with.
To speed up pipeline runs (at least during local development) caching can be enabled in khelm by mounting a host dir as helm home/cache dir - however currently this also only works with the imperative approach unfortunately.

We want to avoid mounting of any kind for declarative execution.

While I understand the security concern behind that decision mounting files would be more efficient than copying non-KRM files. The latter could result in poor performance (impacting UX) for pipelines that rely on many/large non-KRM files.

Current proposal for supporting non-KRM files is for functions to define such wrappers and having a pre-processor function to inject file content into the wrapper. See #2350

That's interesting. Supporting a FileBlob resource kind and an include comment directive could help solving some of the problems I described earlier. However I see some problems with it for my use cases:

  • Manually creating a KRM file for every single non-KRM file would be very cumbersome for developers / bad UX in case of large Helm charts. It would be better if there'd be a way to declare injection of a whole directory (including sub directories) somehow (within Kptfile?!) as wrapped KRM resources.
  • As far as I am aware Helm templates still cannot be managed within a directory that is used as kpt function input since kpt fails to parse them as yaml. Therefore one would need to be able to let the file include directive refer to files outside the function source directory (which should be avoided due to security reasons) or prevent kpt from parsing yaml files that are included as wrapped KRM resources.

To solve both problems the Kptfile could support the declaration of a list of directories/files that should be wrapped into KRM/FileBlob resources. Kpt could ignore those file paths when traversing through the source directory and parsing all yaml files, avoiding yaml parser errors on Helm templates.

Secrets like git credentials or a Helm registries.yaml from the host still cannot be injected directly this way - using sealed secrets that could work as well though.

This is currently not targeting binaries blobs but strings such as Rego or Starlark code. We can optionally support binary blobs by base64 encoding the file.

For binary blobs my only use case so far is to inject the helm cache into a function run. However for that to work the function would also need to be able write cached chart files and repo index yamls back to disk on the host. Though I guess khelm could as well do so by returning FileBlob resource(s) with base64-encoded cache contents in order to let kpt write them to disk (ignored in .gitignore) and provide them as input next time the function is run. This wouldn't even require changes in kpt and could be implemented in khelm directly. However performance-wise this would be pretty inefficient - not sure how much that would impact developer UX.

@howardjohn
Copy link
Contributor

We want to avoid mounting of any kind for declarative execution. Most use cases we've seen can be formulated as running a priviledged function once instead of repeatedly as part of the pipeline. e.g. Expand a Helm chart into a directory with eval whenever you want to update to a new version of the chart, instead of on every invocation of render.

IMO just because something is one-off instead of typical render doesn't make declarative API not useful (see prior art of Kubernetes having actions like TokenReview via standard KRM). The end result just means that we need to (poorly) recreate the kpt pipeline in a makefile or similar.

Maybe it would be possible to have alternative pipelines, so the same syntax can be used, and because it's no longer part of the render function it could possibly allow mounts or network access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hydrate p1 triaged Issue has been triaged by adding an `area/` label
Projects
None yet
Development

No branches or pull requests

9 participants