Custom taints and toleration node operation #9920

vkathole · 2024-06-07T10:27:17Z

No description provided.

openshift-ci · 2024-06-07T10:27:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vkathole

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ocs_ci/ocs/resources/pod.py

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

PrasadDesala · 2024-08-01T07:25:46Z

ocs_ci/ocs/resources/pod.py

+        toleration_key (str): The toleration key to check
+    """
+
+    sub_list = ocp.get_all_resource_names_of_a_kind(kind=constants.SUBSCRIPTION)


You are getting all the resource names of a particular kind and getting its obj instantiating ocp class. Instead, you can directly get all the subscription obj list as below ,
sub_obj_list = ocp.OCP(namespace=config.ENV_DATA["cluster_namespace"], kind=constants.SUBSCRIPTION).get()

ocs_ci/ocs/resources/pod.py

PrasadDesala · 2024-08-01T08:09:42Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        """
+
+        logger.info("Taint all nodes with custom taint")
+        ocs_nodes = get_worker_nodes()


this fun returns all the worker nodes, should we taint only on odf pod running nodes?

you can use get_ocs_nodes() which will return the ocs nodes alone

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

Signed-off-by: Vishakha Kathole <[email protected]>

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation

Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation on existing cluster

Cluster Name: vkathole-t26
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation

Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation

Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

ocs-ci

PR validation on existing cluster

Cluster Name: vkathole-o1
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation on existing cluster

Cluster Name: vkathole-o1
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

Signed-off-by: vkathole <[email protected]>

ocs-ci

PR validation

Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master

Job UNSTABLE (some or all tests failed).

PrasadDesala · 2024-10-10T12:39:14Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

@@ -255,3 +299,145 @@ def test_non_ocs_taint_and_tolerations(self):
                resource_count=count * replica_count,
            ), "New OSDs failed to reach running state"
            check_ceph_health_after_add_capacity(ceph_rebalance_timeout=2500)
+
+        # Reboot one of the nodes


You are performing 4 admin operations in the same test. should we consider moving the newly added operations to a separate test function ? The repetitive code for setting the non ocs taints can be moved to a common function and called in the tests.

Please take input from Brown squad as well.

PrasadDesala · 2024-10-10T12:44:02Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        if is_vsphere_ipi_cluster():
+            nodes.restart_nodes(nodes=node, wait=False)
+            node_names = [n.name for n in node]
+            wait_for_nodes_status(node_names, constants.STATUS_READY, timeout=420)
+        else:


IIUC, the vSPhere node related API would be same for both IPI and UPI. If that is the case, you really don't need to handle vpshere ipi case here

PrasadDesala · 2024-10-10T12:44:35Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        else:
+            nodes.restart_nodes_by_stop_and_start(nodes=node)
+
+        # Wait some time after rebooting master


you are rebooting worker, please correct it in the comment statment

PrasadDesala · 2024-10-10T12:46:49Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

@@ -255,3 +299,145 @@ def test_non_ocs_taint_and_tolerations(self):
                resource_count=count * replica_count,
            ), "New OSDs failed to reach running state"
            check_ceph_health_after_add_capacity(ceph_rebalance_timeout=2500)
+
+        # Reboot one of the nodes
+        node = get_nodes("worker", num_of_nodes=1)


Better to get the ocs_nodes. There are chances that a non ocs node can get picked. Also, I would suggest selecting randomly instead of selecting the same node everytime

PrasadDesala · 2024-10-10T12:48:36Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        logger.info(f"Waiting {waiting_time} seconds.")
+        time.sleep(waiting_time)


you are waiting for the cluster connectivity below, do we still need to sleep?

PrasadDesala · 2024-10-10T12:51:17Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        """
+
+        logger.info("Taint all nodes with custom taint")
+        ocs_nodes = get_worker_nodes()


you can use get_ocs_nodes() which will return the ocs nodes alone

PrasadDesala · 2024-10-10T12:53:15Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+            '[{"effect": "NoSchedule", "key": "xyz", "operator": "Equal", '
+            '"value": "true"}]}}}'
+        )
+        # Select one subscription other than odf subscription


please add a log info message on the step that the test is doing here

PrasadDesala · 2024-10-10T12:54:17Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        for sub in sub_list:
+            if sub != constants.ODF_SUBSCRIPTION:
+                selected_sub = sub
+                break


add a message logging the selected subscription other than odf

what subscription can we expect here? other than odf

PrasadDesala · 2024-10-10T13:01:31Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+            check_toleration_on_pods(toleration_key="xyz")
+            raise AssertionError("Toleration was found, but it should not exist.")
+        except TolerationNotFoundException:
+            pass


you can add a log message here replacing pass and in line 423

Shrivaibavi · 2024-10-10T17:26:30Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+            params = '[{"op": "remove", "path": "/spec/placement"},]'
+            storagecluster_obj.patch(params=params, format_type="json")
+
+            logger.info("Remove tolerations to the subscription")


Suggested change

logger.info("Remove tolerations to the subscription")

logger.info("Remove tolerations from the subscriptions")

Shrivaibavi · 2024-10-10T17:28:35Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+            time.sleep(180)
+            assert wait_for_pods_to_be_running(
+                timeout=900, sleep=15
+            ), "some of the pods didn't came up running"


Suggested change

), "some of the pods didn't came up running"

), "Few pods failed to reach the desired running state"

Shrivaibavi · 2024-10-10T17:32:53Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+                if "config" in subscription_data.get("spec", {}):
+                    params = '[{"op": "remove", "path": "/spec/config"}]'
+                    sub_obj.patch(resource_name=sub, params=params, format_type="json")
+            time.sleep(180)


are we not supposed to remove the tolerations from the rook-ceph operator configmap and ocsinitializations.ocs.openshift.io too ??

Shrivaibavi · 2024-10-10T17:33:51Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

@@ -255,3 +299,145 @@ def test_non_ocs_taint_and_tolerations(self):
                resource_count=count * replica_count,
            ), "New OSDs failed to reach running state"
            check_ceph_health_after_add_capacity(ceph_rebalance_timeout=2500)
+
+        # Reboot one of the nodes


Shrivaibavi · 2024-10-10T17:35:36Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+    def test_negative_custom_taint(self, nodes):
+        """
+        Test runs the following steps
+        1. Taint odf nodes with non-ocs taint


Suggested change

1. Taint odf nodes with non-ocs taint

1. Taint odf worker nodes with non-ocs taint

Shrivaibavi · 2024-10-10T17:38:13Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+
+        assert not wait_for_pods_to_be_running(
+            timeout=120, sleep=15
+        ), "Pods are running when they should not be."


are we expecting all pods to go in a bad state ?

I see we apply tolerations on storagecluster and subscription other than ODF, are we sure all pods will not be running if the toleration is just not applied properly on sub when we are setting it properly on storagecluster ? Please check the scenario again. if we are setting the toleration properly on storagecluster few pods should be up and running.

Shrivaibavi · 2024-10-10T17:49:28Z

tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py

+        ), "Pods are running when they should not be."
+
+        logger.info(
+            "Check custom toleration on all newly created pods under openshift-storage"


Suggested change

"Check custom toleration on all newly created pods under openshift-storage"

"Validate custom toleration not found on all newly created pods in openshift-storage"

vkathole requested review from a team as code owners June 7, 2024 10:27

pull-request-size bot added the size/L PR that changes 100-499 lines label Jun 7, 2024

vkathole added team/e2e E2E team related issues/PRs and removed size/L PR that changes 100-499 lines labels Jun 7, 2024

vkathole self-assigned this Jun 7, 2024

openshift-merge-robot added the needs-rebase label Jun 14, 2024

Shrivaibavi reviewed Jul 31, 2024

View reviewed changes

PrasadDesala reviewed Aug 1, 2024

View reviewed changes

Support custom taints node operation

ee87c8d

Signed-off-by: Vishakha Kathole <[email protected]>

vkathole force-pushed the custom_taints_node_operation branch from 273c9cf to ee87c8d Compare September 19, 2024 10:38

pull-request-size bot added the size/M PR that changes 30-99 lines label Sep 19, 2024

openshift-merge-robot removed the needs-rebase label Sep 19, 2024

vkathole added 2 commits September 19, 2024 16:12

Add changes

7564fce

Signed-off-by: vkathole <[email protected]>

Fix test

a3d05eb

Signed-off-by: vkathole <[email protected]>

ocs-ci reviewed Sep 20, 2024

View reviewed changes

add negative test case

84f36dd

Signed-off-by: vkathole <[email protected]>

pull-request-size bot added size/L PR that changes 100-499 lines and removed size/M PR that changes 30-99 lines labels Sep 26, 2024

ocs-ci reviewed Sep 26, 2024

View reviewed changes

Fix test

e5b9f8e

Signed-off-by: vkathole <[email protected]>

ocs-ci reviewed Oct 1, 2024

View reviewed changes

change for ipi cluster

8a2d775

Signed-off-by: vkathole <[email protected]>

ocs-ci reviewed Oct 2, 2024

View reviewed changes

ocs-ci reviewed Oct 3, 2024

View reviewed changes

add exception

6727cea

Signed-off-by: vkathole <[email protected]>

ocs-ci reviewed Oct 6, 2024

View reviewed changes

increase timeout

644f217

Signed-off-by: vkathole <[email protected]>

ocs-ci reviewed Oct 10, 2024

View reviewed changes

PrasadDesala reviewed Oct 10, 2024

View reviewed changes

Shrivaibavi reviewed Oct 10, 2024

View reviewed changes

Shrivaibavi added Squad/Brown and removed Squad/Brown labels Oct 10, 2024

Shrivaibavi requested review from a team and removed request for a team October 10, 2024 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom taints and toleration node operation #9920

Custom taints and toleration node operation #9920

vkathole commented Jun 7, 2024

openshift-ci bot commented Jun 7, 2024

PrasadDesala Aug 1, 2024

PrasadDesala Aug 1, 2024

PrasadDesala Oct 10, 2024

ocs-ci left a comment

ocs-ci left a comment

ocs-ci left a comment

ocs-ci left a comment

ocs-ci left a comment

ocs-ci left a comment

ocs-ci left a comment

PrasadDesala Oct 10, 2024

Shrivaibavi Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

PrasadDesala Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

Shrivaibavi Oct 10, 2024

		logger.info(f"Waiting {waiting_time} seconds.")
		time.sleep(waiting_time)

	logger.info("Remove tolerations to the subscription")
	logger.info("Remove tolerations from the subscriptions")

	), "some of the pods didn't came up running"
	), "Few pods failed to reach the desired running state"

	1. Taint odf nodes with non-ocs taint
	1. Taint odf worker nodes with non-ocs taint

	"Check custom toleration on all newly created pods under openshift-storage"
	"Validate custom toleration not found on all newly created pods in openshift-storage"

Custom taints and toleration node operation #9920

Are you sure you want to change the base?

Custom taints and toleration node operation #9920

Conversation

vkathole commented Jun 7, 2024

openshift-ci bot commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

ocs-ci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment