Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

thecooldrop · 2022-03-25T07:31:51Z

Describe the solution you'd like
Currently there seems to be an issue in this operator, which makes it very very hard to deploy it into clusters which have other mutating operators within it. Namely this operator does not tolerate Pod changes which are made by other operators and it causes the Jenkins master pod to go into restart-loop if there is even a slightest mismatch between the provisioned Pod and the containers described in the Jenkins CRD.

An example of this behavior when we are using other operators or controllers which inject additional environmen variables into the provisioned Pods. As far as I am aware this happens if New Relic operators are used, but it also happens in many other scenarios where outside systems make changes to the provisioned Pods.

My concrete problem example is that we are using AWS EKS and we are assigning AWS roles to the ServiceAccount which is bound to the Jenkins master Pod. When roles are assumed via this mechanism the AWS injects additional environment variables and volume mounts into the Pod which assumed such a role. This causes the Jenkins Operator to observe a change in volume mounts and environment variables and it terminates the Jenkins master Pod.

This is a highly confusing situation because the users are usually not aware of additional environment variables or volume mount changes introduced by other controllers deployed in their clusters. Since the user is not aware that there are differences it is practically impossible to debug without looking deeply into the source code of this operator and then introducing the additional variables into the Jenkins CRD so that the operator does not detect any changes.

My proposal here is that the comparison rules between actual and expected Jenkins master Pod be relaxed. I think that following proposals may help:

If there are differences currently only the expected state is logged out, but not the actual state. This makes it hard to find the difference since users have to download their own resource definitions and then compare. If there are differences, then both expected and actual states should be logged out to make debugging easier.
If environment variable ENV_VAR is included in Jenkins master Pod, but is not included in Jenkins CRD, then it should not be included in the comparison. This way if some other operator injects environment variables into Pods we should not care about them, as long as they do not overwrite any variables managed by Jenkins Operator.
The principle outlined above for environment variables should be applied to all other resources. Namely as long as all desired configurations from Jenkins CRD are present in the Jenkins Pod then it is all okay and no restarts should be done.

The changes above would make this operator work better in conjuction with other operators of which there are ever more.

While thinking about this I realized that these issues may be resolved by switching from Jenkins as a Pod to Jenkins as a Deployment. If that is the case then I would appreciate having a list of open tasks which are related to migrating the operator from Pod-based to Deployment-based.

The text was updated successfully, but these errors were encountered:

stale · 2022-04-27T22:18:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this issue is still affecting you, just comment with any updates and we'll keep it open. Thank you for your contributions.

thecooldrop added the enhancement New feature or request label Mar 25, 2022

stale bot added the stale label Apr 27, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2023

brokenpip3 added not-stale and removed stale labels May 9, 2023

brokenpip3 reopened this May 9, 2023

DionJones615 mentioned this issue Jun 14, 2024

Environment mutation causing pod to restart #586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

thecooldrop commented Mar 25, 2022

stale bot commented Apr 27, 2022

Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

Comments

thecooldrop commented Mar 25, 2022

stale bot commented Apr 27, 2022