Skip to content

Commit

Permalink
add nri design and user guide (#142)
Browse files Browse the repository at this point in the history
Signed-off-by: Kang.Zhang <[email protected]>
  • Loading branch information
kangclzjc authored Aug 11, 2023
1 parent d84325f commit f090a87
Show file tree
Hide file tree
Showing 7 changed files with 164 additions and 0 deletions.
152 changes: 152 additions & 0 deletions docs/designs/nri-mode-resource-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# NRI Mode Resource Management

## Glossary

NRI, node resource interface. See: https://github.com/containerd/nri

## Summary

We hope to enable NRI mode resource management for koordinator for easy deployment and in-time control.

## Motivation

Koordinator as a QoS based scheduling system for hybrid workloads orchestration on Kubernetes and its runtime hooks support two working [modes](https://github.com/koordinator-sh/koordinator/blob/main/docs/design-archive/koordlet-runtime-hooks.md) for different scenarios: `Standalone` and `Proxy`. However, both of them have some [constraints](https://shimo.im/docs/m4kMLdgO1LIma9qD). NRI (Node Resource Interface), which is a public interface for controlling node resources is a general framework for CRI-compatible container runtime plug-in extensions. It provides a mechanism for extensions to track the state of pod/containers and make limited modifications to their configuration. We'd like to integrate NRI framework to address `Standalone` and `Proxy` constraints based on this community recommend mechanism.

### Goals

- Support NRI mode resource management for koordinator.
- Support containerd container runtime.

### Non-Goals/Future Work

- Support docker runtime

## Proposal

Different from standalone and proxy mode, Koodlet will start an NRI plugin to subscribe pod/container lifecycle events from container runtime (e.g. containerd, crio), and then koordlet NRI plugin will call runtime hooks to adjust pod resources or OCI spec. The flow should be:

- Get pod/container lifecycle events and OCI format information from container runtime (e.g. containerd, crio).
- Transform the OCI format information into internal protocols. (e.g. PodContext, ContainerContext) to re-use existing runtime hook plugins.
- Transform the runtime hook plugins' response into OCI spec format
- Return OCI spec format response to container runtime(e.g. containerd, crio).

![nri-proposal.png](/img/nri-proposal.png)

### User Stories

#### Story 1
As a cluster administrator, I want to apply QoS policy before pod's status become running.

#### Story 2
As a cluster administrator, I want to deploy koordinator cluster without restart.

#### Story 3
As a cluster administrator, I want to adjust resources' policies at runtime.

#### Story 4
As a GPU user, I want to inject environment before pod running.

### Requirements

- Need to upgrade containerd to >= 1.7.0, crio to >= v1.25.0

#### Functional Requirements

NRI mode should support all existing functionalities supported by standalone and Proxy mode.

#### Non-Functional Requirements

Non-functional requirements are user expectations of the solution. Include
considerations for performance, reliability and security.

### Implementation Details/Notes/Constraints
1. koordlet [NRI plugin](https://github.com/containerd/nri/blob/main/plugins/template/plugin.go)
```go
type nriServer struct {
stub stub.Stub
mask stub.EventMask
options Options // server options
}

// Enable 3 hooks (RunPodSandbox, CreateContainer, UpdateContainer) in NRI
func (p *nriServer) Configure(config, runtime, version string) (stub.EventMask, error) {
}

// Sync all pods/containers information before koordlet nri plugin run
func (p *nriServer) Synchronize(pods []*api.PodSandbox, containers []*api.Container) ([]*api.ContainerUpdate, error) {
}

func (p *nriServer) RunPodSandbox(pod *api.PodSandbox) error {
podCtx.FromNri(pod)
RunHooks(...)
podCtx.NriDone()
}

func (p *nriServer) CreateContainer(pod *api.PodSandbox, container *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}

func (p *nriServer) UpdateContainer(pod *api.PodSandbox, container *api.Container) ([]*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
```
2. koordlet enhancement for NRI
- PodContext
```go
// fill PodContext from OCI spec
func (p *PodContext) FromNri(pod *api.PodSandbox) {
}

// apply QoS resource policies for pod
func (p *PodContext) NriDone() {
}
```
- ContainerContext
```go
// fill ContainerContext from OCI spec
func (c *ContainerContext) FromNri(pod *api.PodSandbox, container *api.Container) {
}

// apply QoS resource policies for container
func (c *ContainerContext) NriDone() (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
}
```

### Risks and Mitigations

## Alternatives
There are several approaches to extending the Kubernetes CRI (Container Runtime Interface) to manage container resources such as `standalone` and `proxy`. Under `standalone` running mode, resource isolation parameters will be injected asynchronously. Under `proxy` running mode, proxy can hijack CRI requests from kubelet for pods and then apply resource policies in time. However, `proxy` mode needs to configure and restart kubelet.

There are a little difference in execution timing between `NRI` and `proxy` modes. Hook points (execution timing) are not exactly same. The biggest difference is `proxy` call koordlet hooks between kubelet and containerd. However, NRI will call NRI plugin (koodlet hooks) in containerd, that means containerd still could do something before or after containerd call NRI plugin (koordlet hooks). For example, under `NRI` running mode, containerd setup pod network first and then call NRI plugin (koordlet hooks) in RunPodSanbox, but under `proxy` running mode, containerd couldn't do anything before koordlet hooks running when `proxy` handle RunPodSandbox CRI request.

- Standalone

- kubelet -- CRI Request -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
- kubelet -> Node Agent -> CRI Runtime / containers

![standalone.png](/img/standalone.png)

- Proxy

- kubelet -- CRI Request -> CRI Proxy -- CRI Request (hooked) -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers

![proxy.png](/img/proxy.png)

- NRI

- kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&searr; &emsp; &nearr;
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Koordlet NRI plugin

![nri.png](/img/nri.png)

## Upgrade Strategy

- Need to upgrade containerd to 1.7.0+ or CRIO to 1.25.0+
- Need to enable NRI


11 changes: 11 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,17 @@ If you have problem with connecting to `https://koordinator-sh.github.io/charts/
$ helm install/upgrade koordinator /PATH/TO/CHART
```

## Enable NRI Mode Resource Management

### Prerequisite

- Containerd >= 1.7.0 and enable NRI. Please make sure NRI is enabled in containerd. If not, please refer to [Enable NRI in Containerd](https://github.com/containerd/containerd/blob/main/docs/NRI.md)
- Koordinator >= 1.3

### Configurations

NRI mode resource management is *Enabled* by default. You can use it without any modification on the koordlet config. You can also disable it to set `enable-nri-runtime-hook=false` in koordlet start args. It doesn't matter if all prerequisites are not meet. You can use all other features as expected.

## Install koord-runtime-proxy

koord-runtime-proxy acts as a proxy between kubelet and containerd(dockerd under dockershim scenario), which is designed to intercept CRI request, and apply some resource management policies, such as setting different cgroup parameters by pod priorities under hybrid workload orchestration scenario, applying new isolation policies for latest Linux kernel, CPU architecture, and etc.
Expand Down
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ const sidebars = {
items: [
'designs/koordlet-overview',
'designs/runtime-proxy',
'designs/nri-mode-resource-management',
'designs/enhanced-scheduler-extension',
'designs/load-aware-scheduling',
'designs/fine-grained-cpu-orchestration',
Expand Down
Binary file added static/img/nri-proposal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/nri.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/proxy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/standalone.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f090a87

Please sign in to comment.