Skip to content

troychiu/kueue

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kueue

GoReport Widget Latest Release

kueue logo

Kueue is a set of APIs and controller for job queueing. It is a job-level manager that decides when a job should be admitted to start (as in pods can be created) and when it should stop (as in active pods should be deleted).

Read the overview to learn more.

Features overview

  • Job management: Support job queueing based on priorities with different strategies: StrictFIFO and BestEffortFIFO.
  • Resource management: Support resource fair sharing and preemption with a variety of policies between different tenants.
  • Dynamic resource reclaim: A mechanism to release quota as the pods of a Job complete.
  • Resource flavor fungibility: Quota borrowing or preemption in ClusterQueue and Cohort.
  • Integrations: Built-in support for popular jobs, e.g. BatchJob, Kubeflow training jobs, RayJob, RayCluster, JobSet, plain Pod.
  • System insight: Build-in prometheus metrics to help monitor the state of the system, as well as Conditions.
  • AdmissionChecks: A mechanism for internal or external components to influence whether a workload can be admitted.
  • Advanced autoscaling support: Integration with cluster-autoscaler's provisioningRequest via admissionChecks.
  • All-or-nothing with ready Pods: A timeout-based implementation of All-or-nothing scheduling.
  • Partial admission: Allows jobs to run with a smaller parallelism, based on available quota, if the application supports it.

Production Readiness status

  • ✔️ API version: v1beta1, respecting Kubernetes Deprecation Policy

  • ✔️ Up-to-date documentation.

  • ✔️ Test Coverage:

  • ✔️ Scalability verification via performance tests.

  • ✔️ Monitoring via metrics.

  • ✔️ Security: RBAC based accessibility.

  • ✔️ Stable release cycle(2-3 months) for new features, bugfixes, cleanups.

  • ✔️ Adopters running on production.

    Based on community feedback, we continue to simplify and evolve the API to address new use cases.

Installation

Requires Kubernetes 1.25 or newer.

To install the latest release of Kueue in your cluster, run the following command:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.0/manifests.yaml

The controller runs in the kueue-system namespace.

Read the installation guide to learn more.

Usage

A minimal configuration can be set by running the examples:

kubectl apply -f examples/admin/single-clusterqueue-setup.yaml

Then you can run a job with:

kubectl create -f examples/jobs/sample-job.yaml

Learn more about:

Architecture

Learn more about the architecture of Kueue with the following design docs:

Roadmap

High-level overview of the main priorities for 2025:

  • Improve user experience for MultiKueue - multi-cluster Job dispatching, in particular:
    • sequential attempts to try worker clusters #3757
    • log retrieval from worker clusters 3526
  • Improve user experience for Topology Aware Scheduling, in particular:
    • make Topology Aware Scheduling compatible with cohorts and preemption #3761
    • optimize the algorithm to minimize fragmentation #3756
    • better accuracy of scheduling by tighter integration with kube-scheduler #3755
    • reduce friction by defaulting the PodSet annotations #3754
  • Productization of the Kueue dashboard #940
  • Support Hierarchical Cohorts with FairSharing #3759
  • Improved support for AI inference, including:
    • partial preemption of serving workloads #3762
    • LeaderWorkerSet support #3232
  • Progress towards the stable API (v1beta2) #768

Long-term aspirational goals:

  • Integration with workflow frameworks #74
  • Support dynamically-sized Jobs #77
  • Budget support #28
  • Flavor assignment strategies, e.g. minimizing cost vs minimizing borrowing #312
  • Cooperative preemption support for workloads that implement checkpointing #477
  • Delayed preemption for two-stage admission #3758
  • Support Structured Parameters (DRA) in Kueue #2941
  • Graduate the API to v1 #3476

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page and the contributor's guide.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

About

Kubernetes-native Job Queueing

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 95.9%
  • JavaScript 1.4%
  • Shell 1.2%
  • Makefile 0.6%
  • HTML 0.4%
  • Python 0.3%
  • Other 0.2%