Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Run neurons using Kubernetes Jobs #347

Open
jaredjennings opened this issue Feb 24, 2021 · 9 comments
Open

[Feature Request] Run neurons using Kubernetes Jobs #347

jaredjennings opened this issue Feb 24, 2021 · 9 comments

Comments

@jaredjennings
Copy link

Request Type

Feature Request

Problem Description

Cortex can run analyzers and responders (collectively, neurons, if I'm using the term properly) as subprocesses (ProcessJobRunnerSrv) or using Docker (DockerJobRunnerSrv). When processes are used, all the neuron code ends up in the same filesystem, process, and network namespace as Cortex itself. When Docker is used, both Cortex itself and each neuron's code can run in their own containers. This is a maintainability triumph.

But in order to do this, Cortex's container has to have access to the Docker socket, and it has to share a directory with the neuron containers it runs. (With the DockerJobRunnerSrv, this filesystem sharing happens via access to a directory in the host OS' filesystem.) Access to the Docker socket is not a security best practice, because it's equivalent to local root; also it hinders scalability and flexibility in running neurons, and it means depending specifically on Docker, not generally any software that can make a container.

Kubernetes

Kubernetes offers APIs for running containers which are scalable to multi-node clusters, securable, and supported by multiple implementations. Some live natively in public clouds (EKS, AKS, GKE, etc.), some are trivial single-node clusters for development (minikube, KIND), and some are lightweight but production-grade (k3s, MicroK8S). Kubernetes has extension points for storage, networking, and container running, so these functions can be provided by plugins.

The net effect is that while there is a dizzying array of choices about how to set up a Kubernetes cluster, applications consuming the Kubernetes APIs don't need to make those choices, only to signify what they need. The cluster will make the right thing appear, subject to the choices of its operators. And the people using the cluster need not be the same people as the operators: public clouds support quick deployments with but few questions, I've heard.

Jobs

One of the patterns Kubernetes supports for using containers to get work done is the Job. (I'll capitalize it here to avoid confusion with Cortex jobs.) You create a Job with a Pod spec (which in turn specifies some volumes and some containers), and the Job will create and run the Pod, retrying until it succeeds, subject to limits on time, number of tries, rate limits, and the like. Upon succeeding it remains until deleted, or until a configured timeout expires, etc.

Running a Job with the Kubernetes API would be not unlike running a container with the Docker API, except that it would be done using a different client, and filesystem sharing would be accomplished in a different way.

Sharing files with a job

With the Docker job runner, a directory on the host is specified; it's mounted into the Cortex container; and when Cortex creates a neuron container, it's mounted into that neuron container. This implicitly assumes that the Cortex container and the neuron container are both run on the same host, they can have access to the same filesystem at the same time, and it provides persistent storage. None of these are necessarily true under Kubernetes.

Under Kubernetes, a PersistentVolumeClaim can be created, which backs a volume that can be mounted into a container (as signified in the spec for the container). That claim can have ReadWriteMany as its accessModes setting, which signifies to the cluster a requirement that multiple nodes should be able to read and write files in the PersistentVolume which the cluster provides to satisfy the claim. On a trivial, single-node cluster, the persistent volume can be a hostPath: the same way everything happens now with Docker, but more complicated. But different clusters can provide other kinds of volumes to satisfy such a claim: self-hosted clusters may use Longhorn or Rook to provide redundant, fault-tolerant storage; or public clouds may provide volumes of other types they have devised themselves (Elastic Filesystem, Azure Shared Disks, etc). The PersistentVolumeClaim doesn't care.

So the Cortex container is created with a ReadWriteMany PersistentVolumeClaim backed volume mounted as its job base directory. When running a neuron, it creates a directory for the job, and creates a Job, with volumes which are job-specific subPaths of the same PersistentVolumeClaim mounted as the /job/input (readOnly: true) and /job/output directories. How the files get shared is up to the cluster. The Job can only see and use the input and output directories for the Cortex job it's about. When the Job is finished, the output files will be visible in the Cortex container under the job base directory, as with other job running methods.

How to implement

  1. Choose a Kubernetes client.
    • skuber exists specifically for Scala, but appears to have been last updated in September 2020, and is not automatically generated from the Kubernetes API definition, so it takes manual work to make updates.
    • io.fabric8:kubernetes-client is generally for Java, and it's automatically generated, with the last update this month.
  2. Write a KubernetesJobRunnerSrv, which takes information about a persistent volume claim passed in, and uses it to create a Kubernetes Job for a Cortex job, hewing closely to the DockerJobRunnerSrv.
  3. Follow dependencies and add code as necessary until Cortex can be configured to run jobs this way, and can run the jobs.
  4. Document several use cases.
    • The simplest way to run everything on a single machine.
    • A simple self-hosted setup.
  5. Write a Helm chart or Operator, which will make Cortex deployment quick and easy, given a cluster.
@martinspielmann
Copy link

This would be awesome!

@jaredjennings
Copy link
Author

PR #349 seems to do steps 1-3 of my outline. I took notes on how I set up for building and running (step 4a), but they are not published anywhere yet.

@jaredjennings
Copy link
Author

Notes published. Sorry it has a certificate error! 💔

@jonpulsifer
Copy link

jonpulsifer commented Mar 11, 2021

Great stuff, thanks for this!

Can we revisit the persistent disks? Since k8s jobs are pretty much ephemeral, and that carrying state around is generally challenging in larger deployments, why not prefer an ephemeral volume?

While ReadWriteMany operates as you describe, it doesn't actually do us many favours in the ☁️ 🌈 right now, as it's not really supported, ref https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

In my use case(s) and experience so far, deploying NFS or another storage engine likely carries more debt than running Cortex in production on a VM.

For what it's worth, if you can access the Docker socket on the host machine, the current runner works just fine. I'm running jobs at home in a lab environment, here are some k8s manifests, and the container (ymmv 😄 )

@jaredjennings
Copy link
Author

@jonpulsifer, thanks for your suggestion! I'd love to see this capability without a ReadWriteMany volume. I'll tell you why I haven't found it a problem so far, and why I thought it necessary. This is my first code contribution to Cortex itself; any code improvements you see to be possible could be yours!

My initial tests for this functionality were carried out on a single-machine cluster, where, as I noted above, a path on the host filesystem can become a ReadWriteMany persistent volume. On my multinode test cluster, I'm upgrading to Longhorn 1.1, where ReadWriteMany persistent volumes will reputedly Just Work. Fingers crossed. Neither of these are in a public cloud; but I was able to rustle up some blog article with some kind of directions for Amazon, DigitalOcean and Google clouds.

In most Kubernetes Job examples, including the one you cited, the results of the Job are retrieved by inspecting the log output from a successfully completed container. But Cortex has moved away from the use of stdin/stdout, and toward input and output files. As far as I understand, any files written during a container's execution go away when the container finishes executing, unless they were written to a persistent volume. Only the log output remains, but it conflates stdout and stderr, and so it's impossible to reliably parse.

I hope that's a compelling case for the need for some kind of persistence. But why not use a ReadWriteOnce volume, which has many more implementations to choose from? As far as I know, the set of volumes available to a Kubernetes Pod is fixed when the Pod is created. But the Pod running Cortex is long-lived. So any job output files Cortex will read have to be stored on a volume that exists when the Cortex Pod is started, yet is writable by the job Pod.

But why use files at all? One Kubernetes Job guide I found (and have since lost) suggested that production Jobs needing to persist their output might store it in a database or message queue. But either of those would be an even bigger dependency than a shared filesystem, and I didn't want to presume to redesign the form of communication between Cortex and the jobs a third time.

@TheMatrix97
Copy link

Hi @jaredjennings
First of all, thanks for you effort providing an implementation using Kubernetes Jobs!
I agree using files for communicate Cortex central container with workers is something should be avoided, it brings such a technology overhead, for example, in AWS, it enforces having an EFS CSI Controller, with the corresponding IAM roles, Security groups... etc. All for just a single application of a K8s cluster 0.0
Do you think is viable running the Job processes in the same Cortex central container, using the ProcessJobRunnerSrv? This way all processes share the same container filesystem. Or is it inviable in terms of performance? I'm not quite familiarized with the internals of Cortex processes.
Thanks again!

@jaredjennings
Copy link
Author

@TheMatrix97 thanks for filling in just what the consequences are on AWS. You can certainly run jobs as subprocesses - that was the original way to do it! You can run them all inside the Cortex container, but:

  • you have to build all the analyzer code, and all the code it depends on, into the Cortex container.
  • you have to rebuild and redeploy in order to add new analyzers.
  • the container will need more resources allocated to it, all the time: enough to do the maximum {analysis + web app} workload, keeping in mind that the analysis load is much more variable than the web app load. Most of the time it will not be using many of those resources, but you will still be paying for them.
  • the whole caboodle has to run (and fit!) on one computer.
  • any monitoring or accounting you do of analysis jobs has to reach inside your Cortex container, rather than asking Kubernetes.
  • you don't get a wall between your analysis code working on hostile data, and your web app.

If it costs a lot less to run Cortex in a container like this, rather than in a virtual machine, it might be worth it. But it's (tongue in cheek) all the simplicity of containers, with all the scalability of VMs. Definitely you don't need my code to do it, and that is a plus: this pull request is a year and a half old.

@TheMatrix97
Copy link

If it costs a lot less to run Cortex in a container like this, rather than in a virtual machine, it might be worth it. But it's (tongue in cheek) all the simplicity of containers, with all the scalability of VMs. Definitely you don't need my code to do it, and that is a plus: this pull request is a year and a half old.

I see... As you pointed out It's not that easy to make it work in a single container, it requires some adhoc patches and having a high resource consuming variable container could bring some scalability problems... I'd rather use EFS....
I'm sure we could also think about an approach using a service like MinIO providing this filesystem without having persistance configured. I think this solution would be the prefered one following the Kubernetes principles. Although it enforces some special coding for the K8s use case, and adapt the project to use object storages.
Sorry for "reopening" this old issue, but it got me intrigued... And thanks for your response, it helped me a lot! I'd probably take a look at it in the near future.

@jaredjennings
Copy link
Author

MinIO providing this filesystem

That is a promising route, indeed. I think TheHive added support for storing attachments using MinIO, and I think that code is not in TheHive but in underlying libraries. So perhaps it could be easily co-opted. As for "reopening," I'm quite glad for your interest! I've just taken this back up because of an issue someone filed on my related Helm charts. What I meant by pointing out the age of this pull request is that it does not appear to be a priority for the current maintainers of Cortex to integrate this code. So you take on some risk, and some work, if you use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants