Skip to content

Commit

Permalink
add Use WorkloadSpread with customized workload manual
Browse files Browse the repository at this point in the history
  • Loading branch information
AiRanthem committed Sep 10, 2024
1 parent e78f1c6 commit 9c19d97
Show file tree
Hide file tree
Showing 7 changed files with 1,184 additions and 961 deletions.
265 changes: 188 additions & 77 deletions docs/user-manuals/workloadspread.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,38 @@ title: WorkloadSpread

**FEATURE STATE:** Kruise v0.10.0

WorkloadSpread can distribute Pods of workload to different types of Node according to some polices, which empowers single workload the abilities for
WorkloadSpread can distribute Pods of workload to different types of Node according to some polices, which empowers
single workload the abilities for
multi-domain deployment and elastic deployment.

Some common policies include:

- fault toleration spread (for example, spread evenly among hosts, az, etc)
- spread according to the specified ratio (for example, deploy Pod to several specified az according to the proportion)
- subset management with priority, such as
- deploy Pods to ecs first, and then deploy to eci when its resources are insufficient.
- deploy a fixed number of Pods to ecs first, and the rest Pods are deployed to eci.
- deploy Pods to ecs first, and then deploy to eci when its resources are insufficient.
- deploy a fixed number of Pods to ecs first, and the rest Pods are deployed to eci.
- subset management with customization, such as
- control how many pods in a workload are deployed in different cpu arch
- enable pods in different cpu arch to have different resource requirements
- control how many pods in a workload are deployed in different cpu arch
- enable pods in different cpu arch to have different resource requirements

The feature of WorkloadSpread is similar with **UnitedDeployment** in OpenKruise community. Each WorkloadSpread defines multi-domain
The feature of WorkloadSpread is similar with **UnitedDeployment** in OpenKruise community. Each WorkloadSpread defines
multi-domain
called `subset`. Each domain may provide the limit to run the replicas number of pods called `maxReplicas`.
WorkloadSpread injects the domain configuration into the Pod by Webhook, and it also controls the order of scale in and scale out.
WorkloadSpread injects the domain configuration into the Pod by Webhook, and it also controls the order of scale in and
scale out.

Kruise with version lower than `1.3.0` supports `CloneSet`, `Deployment`, `ReplicaSet`.

Since Kruise `1.3.0`, WorkloadSpread supports `StatefulSet`.

In particular, for `StatefulSet`, WorkloadSpread supports manage its subsets only when `scale up`. The order of `scale down` is still controlled by StatefulSet controller. The subset management of StatefulSet is based on ordinals of Pods, and more details can be found [here](https://github.com/openkruise/kruise/blob/f46097db1fa5a4ed9c002eba050b888344884e11/pkg/util/workloadspread/workloadspread.go#L305).
In particular, for `StatefulSet`, WorkloadSpread supports manage its subsets only when `scale up`. The order of
`scale down` is still controlled by StatefulSet controller. The subset management of StatefulSet is based on ordinals of
Pods, and more details can be
found [here](https://github.com/openkruise/kruise/blob/f46097db1fa5a4ed9c002eba050b888344884e11/pkg/util/workloadspread/workloadspread.go#L305).

Since Kruise `1.5.0`, WorkloadSpread supports `customized workloads that have [scale sub-resource](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource)`.
Since Kruise `1.5.0`, WorkloadSpread supports
`customized workloads` that have [scale sub-resource](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource).

## Demo

Expand Down Expand Up @@ -86,7 +94,9 @@ spec:

- `name`: the name of `subset`, it is distinct in a WorkloadSpread, which represents a topology.

- `maxReplicas`:the replicas limit of `subset`, and must be Integer and >= 0. There is no replicas limit while the `maxReplicas` is nil.
- `maxReplicas`:the replicas limit of `subset`, and must be Integer and >= 0. There is no replicas limit while the
`maxReplicas` is nil.

> Don't support percentage type in current version.

- `requiredNodeSelectorTerm`: match zone hardly。
Expand All @@ -97,12 +107,13 @@ spec:
`preferredNodeSelectorTerms` corresponds the `preferredDuringSchedulingIgnoredDuringExecution` of nodeAffinity.

- `tolerations`: the tolerations of Pod in `subset`.

```yaml
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
```

- `patch`: customize the Pod configuration of `subset`, such as Annotations, Labels, Env.
Expand All @@ -122,22 +133,22 @@ patch:
patch:
spec:
containers:
- name: main
resources:
limit:
cpu: "2"
memory: 800Mi
- name: main
resources:
limit:
cpu: "2"
memory: 800Mi
```

```yaml
# patch pod container env with a zone name:
patch:
spec:
containers:
- name: main
env:
- name: K8S_AZ_NAME
value: zone-a
- name: main
env:
- name: K8S_AZ_NAME
value: zone-a
```

## Schedule strategy
Expand All @@ -157,25 +168,30 @@ WorkloadSpread provides two kind strategies, the default strategy is `Fixed`.

- Adaptive:

**Reschedule**: Kruise will check the unschedulable Pods of subset. If it exceeds the defined duration, the failed Pods will be rescheduled to the other `subset`.
**Reschedule**: Kruise will check the unschedulable Pods of subset. If it exceeds the defined duration, the failed
Pods will be rescheduled to the other `subset`.

## Requirements

WorkloadSpread defaults to be disabled. You have to configure the feature-gate *WorkloadSpread* when install or upgrade Kruise:
WorkloadSpread defaults to be disabled. You have to configure the feature-gate *WorkloadSpread* when install or upgrade
Kruise:

```bash
$ helm install kruise https://... --set featureGates="WorkloadSpread=true"
```

### Pod Webhook

WorkloadSpread uses `webhook` to inject fault domain rules.

If the `PodWebhook` feature-gate is set to false, WorkloadSpread will also be disabled.

### deletion-cost feature

`CloneSet` has supported deletion-cost feature in the latest versions.

The other native workload need kubernetes version >= 1.21. (In 1.21, users need to enable PodDeletionCost feature-gate, and since 1.22 it will be enabled by default)
The other native workload need kubernetes version >= 1.21. (In 1.21, users need to enable PodDeletionCost feature-gate,
and since 1.22 it will be enabled by default)

## Scale order:

Expand All @@ -185,12 +201,15 @@ The workload managed by WorkloadSpread will scale according to the defined order

### Scale out

- The Pods are scheduled in the subset order defined in the `spec.subsets`. It will be scheduled in the next `subset` while the replica number reaches the maxReplicas of `subset`
- The Pods are scheduled in the subset order defined in the `spec.subsets`. It will be scheduled in the next `subset`
while the replica number reaches the maxReplicas of `subset`

### Scale in

- When the replica number of the `subset` is greater than the `maxReplicas`, the extra Pods will be removed in a high priority.
- According to the `subset` order in the `spec.subsets`, the Pods of the `subset` at the back are deleted before the Pods at the front.
- When the replica number of the `subset` is greater than the `maxReplicas`, the extra Pods will be removed in a high
priority.
- According to the `subset` order in the `spec.subsets`, the Pods of the `subset` at the back are deleted before the
Pods at the front.

```yaml
# subset-a subset-b subset-c
Expand All @@ -204,7 +223,95 @@ The workload managed by WorkloadSpread will scale according to the defined order
# deletion order: b -> a -> c
```

## Use WorkloadSpread with customized workload

If you want to use WorkloadSpread with custom workloads, which is disabled by default, some
additional configuration is required. This section uses
the [Rollout Workload from the Argo community](https://argoproj.github.io/argo-rollouts/) as an example to
demonstrate how to integrate it with WorkloadSpread.

### Configure the custom workload watch whitelist

First, you need to add the custom workload to the `WorkloadSpread_Watch_Custom_Workload_WhiteList` to ensure it can be
read and understood by WorkloadSpread.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kruise-configuration
namespace: kruise-system
data:
"WorkloadSpread_Watch_Custom_Workload_WhiteList": |
{
"workloads": [
{
"Group": "argoproj.io",
"Kind": "Rollout",
"replicasPath": "spec.replicas",
"subResources": []
}
]
}
```

The specific configuration items are explained as follows:

- **Group:** ApiGroup of the customized workload.
- **Kind:** Kind of the customized workload.
- **subResources:** SubResources of the customized workload, including Group and Kind. For example: Deployment's
ReplicaSet.
- **replicasPath:** Resource path to the replicas in the resource. For example: spec.replicas.

### Authorize kruise-manager

To use WorkloadSpread with custom workloads, you need to grant the kruise-manager service account read permissions for
the respective resources.

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kruise-rollouts-access
rules:
- apiGroups: [ "argoproj.io" ]
resources: [ "rollouts" ]
verbs: [ "get" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kruise-rollouts-access-binding
subjects:
- kind: ServiceAccount
name: kruise-manager
namespace: kruise-system
roleRef:
kind: ClusterRole
name: kruise-rollouts-access
apiGroup: rbac.authorization.k8s.io
```

### Reference the custom workload in WorkloadSpread

Once the configuration is complete, the custom workload can be referenced in the `targetRef` field of WorkloadSpread.

```yaml
apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
name: workloadspread-demo
spec:
targetRef:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
name: rollouts-demo
subsets:
...
```

## feature-gates

WorkloadSpread feature is turned off by default, if you want to turn it on set feature-gates *WorkloadSpread*.

```bash
Expand All @@ -218,6 +325,7 @@ $ helm install kruise https://... --set featureGates="WorkloadSpread=true"
`zone-a`(ACK) holds 100 Pods, `zone-b`(ECI) as an elastic zone holds additional Pods.

1. Create a WorkloadSpread instance.

```yaml
apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
Expand All @@ -230,30 +338,31 @@ spec:
kind: CloneSet
name: workload-xxx
subsets:
- name: ACK # zone ACK
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- ack
maxReplicas: 100
patch: # inject label.
metadata:
labels:
deploy/zone: ack
- name: ECI # zone ECI
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- eci
patch:
metadata:
labels:
deploy/zone: eci
- name: ACK # zone ACK
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- ack
maxReplicas: 100
patch: # inject label.
metadata:
labels:
deploy/zone: ack
- name: ECI # zone ECI
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- eci
patch:
metadata:
labels:
deploy/zone: eci
```

2. Creat a corresponding workload, the number of replicas ca be adjusted freely.

#### Effect
Expand All @@ -267,6 +376,7 @@ spec:
Deploy 100 Pods to two `zone`(zone-a, zone-b) separately.

1. Create a WorkloadSpread instance.

```yaml
apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
Expand All @@ -279,32 +389,33 @@ spec:
kind: CloneSet
name: workload-xxx
subsets:
- name: subset-a
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-a
maxReplicas: 100
patch:
metadata:
labels:
deploy/zone: zone-a
- name: subset-b
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-b
maxReplicas: 100
patch:
metadata:
labels:
deploy/zone: zone-b
- name: subset-a
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-a
maxReplicas: 100
patch:
metadata:
labels:
deploy/zone: zone-a
- name: subset-b
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-b
maxReplicas: 100
patch:
metadata:
labels:
deploy/zone: zone-b
```

2. Creat a corresponding workload with a 200 replicas, or perform a rolling update on an existing workload.

3. If the spread of zone needs to be changed, first adjust the `maxReplicas` of `subset`, and then change the `replicas` of workload.
3. If the spread of zone needs to be changed, first adjust the `maxReplicas` of `subset`, and then change the `replicas`
of workload.
Loading

0 comments on commit 9c19d97

Please sign in to comment.