Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the force-skip-ci usage #248

Open
fidencio opened this issue Jan 20, 2022 · 13 comments
Open

Document the force-skip-ci usage #248

fidencio opened this issue Jan 20, 2022 · 13 comments
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.

Comments

@fidencio
Copy link
Member

A force-skip-ci label has been added a long time ago in order to allow folks to by-pass the CI for one reason or another. However, there's the concern of having this misused (not due to bad intention, but rather due to a mistake), which brings up the need of a written policy to use it.

We can also extend this issue to discuss whether having such label is valid or not.

/cc @kata-containers/architecture-committee

@fidencio fidencio added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. labels Jan 20, 2022
@egernst
Copy link
Member

egernst commented Jan 20, 2022

I wasn't aware of the flag, so indeed we should setup documentation around it (or I should be more involved and pay attention!).

I worry that this is a very big hammer that is available to too many people without any process defined currently.

In the past, we've done similar, maybe more terrible, things by editing the required jobs associated with a repository, based on the repository's settings.

I would hope, but not necessarily expect, that the list of people who can apply a label to a PR/issue is a much larger superset of the people who have administrative rights to modify repository settings in GitHub. If that is the case, I'd prefer we define a process, and rely on changing the settings rather than using a flag. If that is not the case, I want to see what we can do to make it the case.

@fidencio
Copy link
Member Author

fidencio commented Jan 20, 2022

I am afraid that's not the case, @egernst , and I think it deserves yet another issue (and a nice discussion) on who are the folks who should have "admin" rights to the repos from now on, and revisit everyone's access based on that decision.

@fidencio
Copy link
Member Author

Still about this one, even setting the force-skip-ci we still need the approval of two other maintainers. This part is not skipped.
So, yes, this is a big hammer, but this is a big hammer that at least 3 maintainers should agree on using it and any other reviewer can block it.

@bergwolf
Copy link
Member

My other concern is the consequence of using the flag. IMO we need the flag because we want to bypass some tests in order to merge a PR that cannot pass CI but is not supposed to break anything in CI after being merged. So the CI jobs that we can bypass must fall into the category that it ONLY check something that belongs to the targeting PR and that something will not be checked after the PR is merged.

So checking the commit message format is one example of such CI jobs that can safely be skipped. But in the case of force-skip-ci flag, we are skipping much more than that. E.g., the static checker is skipped in kata-containers/kata-containers#3468 and that's why it broke the CI after it was merged.

So my suggestion would be, let's remove force-skip-ci, and add something that is really needed but nothing more, skip-commit-message-check. I am not sure if there are other types of such safe jobs to skip, but we can add things incrementally.

@fidencio
Copy link
Member Author

fidencio commented Jan 21, 2022

E.g., the static checker is skipped in kata-containers/kata-containers#3468 and that's why it broke the CI after it was merged.

I disagree here, @bergwolf.

So, let's take a step back and evaluate what actually happened with kata-containers/kata-containers#3468.

  • I raised a PR that I knew that would break the CI.
  • I advertised that in the first message of the PR.
    We'll need further commits to actually start using govmm from the kata-containers repo, but those will come later as this commit will, most likely, have to be merged with a force-skip-ci for now.
    
  • I worked on solving the static-checks issues before having the PR merged
    I am adding the do-not-merge back, in order to ensure I can test things on my fork first (there's a short series making static checks and other things happy that depends on this one).
    
  • After that I had everything working on my fork, I proceeded with merging govmm: Bring the project in kata-containers#3468
  • A few minutes after having govmm: Bring the project in kata-containers#3468 merged, I raised govmm: Use it from our own repo kata-containers#3496, which included the fixes.
    • There I found out issues that I was NOT able to catch from the static-checks, which were architecture specific issues, and I worked to solve them (as in, report and work them around) before the end of my day, and I only finished my day being sure CI would NOT be blocked.
  • A NON related failure happens on one of the CIs
  • Absolutely no-one took the time to check whether the failure was related or not
  • The CI had to wait for me to come back, 8 hours later, to re-run the job with the NON related failure
  • Situation got fixed

Looking at what happened I don't think the CI was broken for 12 to 18 hours because of the force-skip-ci, although I think we should discuss its existence. Looking at what happened I think the CI was broken for 12 to 18 hours because of a combination of:

  • Not having a way to test on different architectures locally, which is quite hard to solve by the way.
  • Not having a stable enough CI (so we could avoid the non related failures)
  • Not having folks engaged enough to actually look at the failures and re-run a CI

I'm all in to solve all those issues, but the third one has to come from each contributor.

@fidencio
Copy link
Member Author

And just for the archaeological sake, kata-containers/tests#2915 is when this was introduced.

@fidencio
Copy link
Member Author

@bergwolf and I had a call Today, where we discussed this topic and here's pretty much the summary of the discussion.

We should, instead of having the force-skip-ci label, have ways to bypass some specific checks that break the current CI status, but that will NOT leave the CI broken after that. However, in order to better identify and act on those we need to do some work beforehand, as:

Meanwhile, we should really re-visit who has permissions to do what in our repo and also get an agreement on #249.

This is just a summary of what's been discussed and issues will be opened later on.

@jodh-intel
Copy link
Contributor

jodh-intel commented Jan 21, 2022

Firstly, I'd better own up as the author of the force-skip-ci feature.

  • Once we have it in place, drop the force-skip-ci label, in favour of its smaller and self-contained new parts.

I'm not necessarily against this, but sometimes we just need a way to bypass the CI (almost) entirely, and quickly. We could just disable all the PR checks for the repo in the GH settings, but that is not something we would actually want to do.

I like the idea of more granular skipping, but the worst case might be to specify quite a large bunch of magic labels / comments to skip the CI which sounds a bit awkward (and slow).

Maybe we could retain the force-skip-ci feature "for emergencies", but only allow it for named users (aka the AC members plus CI owners?) That way we'd have a small number of users with "CI super user" abilities".

This sounds like a topic for the AC to discuss.

@jodh-intel
Copy link
Contributor

And once we've got an agreed plan for this, it would be great to have commitment from all interested parties to help implement, test, document and review the changes. And help review the existing user rights on the two main repos.

@bergwolf
Copy link
Member

Just deleted my last comment as I refreshed the page and saw your last comment

@fidencio To add more details about this one (which was also part of our discussion, and agreement if I understand correctly)

Evalute from each one of the new pieces could have a "skip" label (for urgent cases).

The standard of whether a piece can be skipped or not, is that

  • It should not have impact on the CIs for PRs after the targeting PR is merged

So it narrows the candidate jobs to be almost only the commit message check, and the github issus mover. And it won't result in a situation that the worst case might be to specify quite a large bunch of magic labels / comments to skip the CI which sounds a bit awkward (and slow). as being raised by @jodh-intel

@bergwolf
Copy link
Member

@jodh-intel

sometimes we just need a way to bypass the CI (almost) entirely, and quickly

Yes we do. And I agree with @egernst that we need a process to handle it through GH settings instead of trying to solve it with labels.

@bergwolf
Copy link
Member

The commit message check is already done in a separate action. So this looks much easier to solve:

  1. remove force-skip-ci label
  2. add skip-commit-message-check label
  3. skip commit message checker when skip-commit-message-check label is set (which depends on force-skip-ci right now)

There is also checkcommits in the tests repo. IMO we can just rely on the GH action for commit message check, no need to check twice.

@bergwolf
Copy link
Member

@fidencio @jodh-intel @egernst
Looking at .github/workflows in the kata-containers repo, we have these actions respecting force-skip-ci:

  • PR-wip-checks.yaml
  • commit-message-check.yaml
  • kata-deploy-push.yaml
  • move-issues-to-in-progress.yaml
  • require-pr-porting-labels.yaml
  • snap.yaml
  • static-checks.yaml

Among them, kata-deploy-push.yaml, snap.yaml and static-checks.yaml will impact other PRs if they are skipped. So how about removing force-skip-ci from these three, and replace it with skip-side-effectless-ci in all the other actions? This will be much easier to implement than splitting static-check.sh, which is a good target btw, but may not be a MUST-have to fix force-skip-ci.

Jenkins jobs also check for the force-skip-ci label and skip all tests. I think it should be dropped too there as well. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.
Projects
None yet
Development

No branches or pull requests

4 participants