MicrosoftDocs · ffidelis · Dec 13, 2024 · Dec 16, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml b/support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml
@@ -3,7 +3,7 @@ metadata:
   title: Azure Kubernetes Service (AKS) common issues FAQ
   description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
   ms.topic: faq
-  ms.date: 11/14/2023
+  ms.date: 12/17/2024
   ms.reviewer: chiragpa, nickoman, v-leedennis
   ms.service: azure-kubernetes-service
   ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
@@ -26,7 +26,7 @@ sections:
       - question: |
           Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
         answer: |
-          If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
+          No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. More details you can check in the [Operations FAQ](https://learn.microsoft.com/en-us/azure/aks/faq#operations) documentation
 
       - question: |
           What naming restrictions are enforced for AKS resources and parameters?
@@ -42,6 +42,9 @@ sections:
           - AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
 
           - The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
+          Further details about naming convention are available in following links:
+          - https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice
+          - https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers
 
 additionalContent: |
   [!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]

diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/error-code-poddrainfailure.md b/support/azure/azure-kubernetes/create-upgrade-delete/error-code-poddrainfailure.md
@@ -1,7 +1,7 @@
 ---
 title: Troubleshoot UpgradeFailed errors due to eviction failures caused by PDBs
 description: Learn how to troubleshoot UpgradeFailed errors due to eviction failures caused by Pod Disruption Budgets when you try to upgrade an Azure Kubernetes Service cluster.
-ms.date: 12/21/2023
+ms.date: 12/13/2024
 editor: v-jsitser
 ms.reviewer: chiragpa, v-leedennis, v-weizhu
 ms.service: azure-kubernetes-service
@@ -15,24 +15,55 @@ This article discusses how to identify and resolve UpgradeFailed errors due to e
 
 ## Prerequisites
 
-This article requires Azure CLI version 2.0.65 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
+This article requires Azure CLI version 2.67.0 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
 
 For more detailed information about the upgrade process, see the "Upgrade an AKS cluster" section in [Upgrade an Azure Kubernetes Service (AKS) cluster](/azure/aks/upgrade-cluster#upgrade-an-aks-cluster).
 
 ## Symptoms
 
-An AKS cluster upgrade operation fails with the following error message:
+An AKS cluster upgrade operation fails with one of the following error messages:
 
-> Code: UpgradeFailed  
-> Message: Drain node \<node-name> failed when evicting pod \<pod-name>. Eviction failed with Too many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See `http://aka.ms/aks/debugdrainfailures`. Original error: API call to Kubernetes API Server failed.
+> (UpgradeFailed) Drain `node aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
+
+> Code: UpgradeFailed
+> Message: Drain node `aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
 
 ## Cause
 
-This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained.
+This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained and the upgrade operation after several attempts fails and the cluter/node pool fall in `Failed` state.
+
+Check the PDB configuration: **ALLOWED DISRUPTIONS** value. The value should be **1** or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets). For instance, you can check the workload and its PDB as following. You should observe the **ALLOWED DISRUPTION** column doesn't allow any disruption. If the **ALLOWED DISRUPTIONS** value is **0**, the pods won't be evicted and node drain will fail during the upgrade process:
+
+```output
+$ kubectl get deployments.apps nginx
+NAME    READY   UP-TO-DATE   AVAILABLE   AGE
+nginx   2/2     2            2           62s
+
+$ kubectl get pod
+NAME                     READY   STATUS    RESTARTS   AGE
+nginx-7854ff8877-gbr4m   1/1     Running   0          68s
+nginx-7854ff8877-gnltd   1/1     Running   0          68s
+
+$ kubectl get pdb
+NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
+nginx-pdb   2               N/A               0                     24s
+
+```
 
-To test this situation, run `kubectl get pdb -A`, and then check the **Allowed Disruption** value. The value should be **1** or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets).
+You can also check for any entries in Kubernetes events with the command `kubectl get events | grep -i drain`. A similar output below will show the information `Eviction blocked by Too Many Requests (usually a pdb)`:
+
+```output
+$ kubectl get events | grep -i drain
+LAST SEEN   TYPE      REASON                    OBJECT                                   MESSAGE
+(...)
+32m         Normal    Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Draining node: aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx
+2m57s       Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+12m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+32m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+32m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+31m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+```
 
-If the **Allowed Disruption** value is **0**, the node drain will fail during the upgrade process.
 
 To resolve this issue, use one of the following solutions.
 
@@ -41,18 +72,41 @@ To resolve this issue, use one of the following solutions.
 1. Adjust the PDB to enable pod draining. Generally, The allowed disruption is controlled by the `Min Available / Max unavailable` or `Running pods / Replicas` parameter. You can modify the `Min Available / Max unavailable` parameter at the PDB level or increase the number of `Running pods / Replicas` to push the Allowed Disruption value to **1** or greater.
 2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
 
+```output
+$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+ Are you sure you want to perform this operation? (y/N): y
+Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+```
+
 ## Solution 2: Back up, delete, and redeploy the PDB
 
-1. Take a backup of the PDB `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb_backup.yaml`, and then delete the PDB `kubectl delete pdb <pdb-name> -n /<pdb-namespace>`. After the upgrade is finished, you can redeploy the PDB `kubectl apply -f pdb_backup.yaml`.
+1. Take a backup of the PDB(s) `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb-name-backup.yaml`, and then delete the PDB `kubectl delete pdb <pdb-name> -n <pdb-namespace>`. After the new upgrade attempt is finished, you can redeploy the PDB just applying the backup file: `kubectl apply -f pdb-name-backup.yaml`.
 2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
 
-## Solution 3: Delete the pods that can't be drained
+```output
+$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+ Are you sure you want to perform this operation? (y/N): y
+Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+```
+
+## Solution 3: Delete the pods that can't be drained or scale the workload down to zero (0)
 
 1. Delete the pods that can't be drained.
 
    > [!NOTE]
-   > If the pods were created by a deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete the deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <kubernetes-object> <name> -n <namespace> -o yaml > backup.yaml`.
+   > If the pods were created by a Deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete or scale the workload replicas to zero (0) of the Deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <deployment.apps -or- statefulset.apps> <name> -n <namespace> -o yaml > backup.yaml`.
 
-2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
+2. To scale-down, you can use `kubectl scale --replicas=0 <deployment.apps -or- statefulset.apps> <name> -n <namespace>` before the reconciliation
+
+3. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
+
+```output
+$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+ Are you sure you want to perform this operation? (y/N): y
+Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+```
 
 [!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]