Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set Topology Spread Constraints, rather than just Affinity #217

Merged

Conversation

DavidRobertsOrbis
Copy link
Contributor

Description of the change

Adds the ability to use Topology Spread Constraints (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#pod-topology-spread-constraints).

Existing or Associated Issue(s)

None

Additional Information

Default behaviour remains unchanged. Nothing removed.

The chart currently supports affinity rules, but topologySpreadConstraints give finer control - see https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#comparison-with-podaffinity-podantiaffinity.

In my specific use-case, https://kubernetes.io/blog/2023/04/17/fine-grained-pod-topology-spread-features-beta/#kep-3243-respect-pod-topology-spread-after-rolling-upgrades allows reuse of nodes during the rollout of a deployment, but I still end up with an appropriate spread of pods at the end. Similar changes are coming to affinity in k8s 1.31, so we could just wait; by enabling topology spread constraints now, we can allow people running older versions of k8s to take advantage. There are workarounds, but they're not as clean.

Topology spread constraints are also helpful in other circumstances (e.g. if you have more replicas than zones, and want an even-ish spread).

Checklist

  • Chart version bumped in Chart.yaml according to semver.
  • Variables are documented in the values.yaml and added to the README.md. The helm-docs utility can be used to generate the necessary content. Use helm-docs --dry-run to preview the content.
  • JSON Schema generated.
  • List tests pass for Chart using the Chart Testing tool and the ct lint command.

@DavidRobertsOrbis DavidRobertsOrbis requested a review from a team as a code owner August 29, 2024 13:16
@DavidRobertsOrbis
Copy link
Contributor Author

Asides:

  • I notice that the chart specifies a minimum k8s version of 1.19; meanwhile, the values spec template is pulling down the latest k8s spec for e.g. the affinity property (and the new podTopologyConstraints property). Is this OK? I suspect it's fine, because these are optional values anyway.
  • The CONTRIBUTING.md docs suggest I need to sign with GPG etc. (-S), but my previous PR into the main backstage repo only required Signed-off-by (e.g. -s). I notice that these contributing docs are relatively new. Can you confirm which is needed here?

Copy link

github-actions bot commented Sep 6, 2024

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

@github-actions github-actions bot added the stale label Sep 6, 2024
@DavidRobertsOrbis
Copy link
Contributor Author

Reopen, please

@github-actions github-actions bot removed the stale label Sep 7, 2024
@ChrisJBurns
Copy link
Contributor

Thanks @DavidRobertsOrbis are you able to add some CI tests for the new additions?

@DavidRobertsOrbis
Copy link
Contributor Author

Thanks @DavidRobertsOrbis are you able to add some CI tests for the new additions?

I see the *-values.yaml files in the ci folder; I've added a new file there to exercise the new value. ct lint passes locally. I don't see anything that specifies what the expected result is - am I missing something, or does this just check that we generate a valid k8s manifest?

If you're referring to something other than the linting, such as https://helm.sh/docs/topics/chart_tests/, can you point me at an existing example?

Copy link

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

@github-actions github-actions bot added the stale label Sep 19, 2024
@DavidRobertsOrbis
Copy link
Contributor Author

Not stale.

@ChrisJBurns anything else you need from me? I see that #216 has fixed the chart tests - are we waiting for that to land, so I can rebase?

@github-actions github-actions bot removed the stale label Sep 20, 2024
Copy link

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

@github-actions github-actions bot added the stale label Sep 27, 2024
@DavidRobertsOrbis
Copy link
Contributor Author

Not stale.

@github-actions github-actions bot removed the stale label Sep 28, 2024
@ChrisJBurns
Copy link
Contributor

@DavidRobertsOrbis Yep once #216 is merged you can rebase and that should sort the tests out. Additionally, on the GPG signing point, we request on this specific repo that the commits are signed with a key and all commits are signed with a signature. So there will be a combo of both -s and -S. Hope that helps

Copy link

This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution!

@github-actions github-actions bot added the stale label Oct 10, 2024
@vinzscam
Copy link
Member

Hi @DavidRobertsOrbis,
the issue has been fixed, but your new test is failing. Could you have a look? 🙏

@github-actions github-actions bot removed the stale label Oct 11, 2024
@DavidRobertsOrbis
Copy link
Contributor Author

@vinzscam Looks like a "legitimate" failure this time, though caused by the test environment being different from my expectations. To some extent, this proves that the chart is behaving appropriately :)

0/1 nodes are available: 1 node(s) didn't match pod topology spread constraints (missing required label)

I've removed the zone constraint from the test case, so it's only relying on the hostname. Can you run it again? If this still fails, can you give me any insight into what labels are present on the test cluster nodes?

Copy link
Member

@vinzscam vinzscam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! 🙏

@vinzscam vinzscam merged commit e7efd3e into backstage:main Oct 18, 2024
3 checks passed
@DavidRobertsOrbis DavidRobertsOrbis deleted the topology-spread-constraints branch October 18, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants