-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent removal of trailing whitespace for clean diff
s
#505
Comments
Hi Sadie, Thanks for taking the lead here - if we were all devs I'd say let's use a pre-commit hook, but considering the diversity of technical setups that our contributors have, my recommendation would be to setup an automation that remotes trailing whitespace on top of PRs. Thinking out loud here, not sure about feasibility, what if at every PR there's a check that looks for trailing whitespace and if it's found adds a commit on top of that that kills the trailing whitespace? In fact with such an action, a manual blitz would no longer be necessary... |
Thanks @erget, I agree that pre-commit hooks are probably too much of a barrier to employ and that some automation solution is best.
Sounds good to me. I will look into options and see if I can get a job running that implements this, once I've waited a few working days to see if anyone has any issue with the idea in general. |
I"m no expert in git -- but I'm pretty sure that tehre is a way to set up a pre-commit hook that does this beofre the commit gets pushed.
This is less than ideal, because you get an extra "wasted" commit -- a somewhat ugly history -- as an example "git blame" would show that every line that had whitespace removed has been last edited by the system, rather than whoever actually made a change. Some quick googling did not reveal an answer, but this is a VERY common issue, there has got to be a standard solution out there! |
I agree that pre-commit probably will be a barrier. Is the auto-blitz intended to happen when opening a PR, and every time time the PR is updated? Then I agree agree with @ChrisBarker-NOAA that this will complicate the history, and that there ought to be a smart solution already out there. Could the auto-bliz happen when the PR is merged? |
I expect most PRs contain more than one commit. So I don't think we should worry too much about an extra commit (or two) on a PR messing up the commit history. There are lots of style guide checkers for code (PEP8, Spotless, etc.) but I haven't found any for Asciidoc. I've work on other projects where the workflow involves a test on PRs that fail if the style isn't followed. A committer must then push another commit with the fix so the test passes before the PR can be merged. Not sure if that is better or easier than an automated process but it is a bit more visible (which might help with awareness among contributors of the style guide). |
Thanks all for your thoughts and ideas. And it is good to hear some background to this issue e.g. from @ethanrd. It's probably wise as Ethan suggests to consider style linting as a whole whilst we think about this, because if we implement a solution here we would want it to be extensible to any further style fixes we might want to use too, eventually at least. And I think trailing whitespace removal is a good thing to start with, since it's non-controversial, easy to run operations to remove, and clear to review for manually to check our solution is doing what we want. As everyone's indicated, there's no perfect solution, but plenty of (fairly-)standard approaches to choose from. It seems we have collectively thought of the following possibilities, which I've numbered to help us discuss, and moreover are agreed (please do correct me if anyone feels this is wrong or mis-representative) that we don't want the first so I've crossed it out:
Maybe if everyone could outline their favourite (or unfavoured) option(s), or ask any questions about them, then we can determine the best way forward? If it helps, based on past experience I have I think it would be simple enough for me to set up any of these solutions, so ease of implementation isn't a concern (to me, at least). |
Thanks @sadielbartholomew , excellently summarised :) I'm happy with both 2 and 3, but I could see a certain charm in traceability of implementing 2 and agreeing to do this before minting a new release. That means that it would take place en bloc before the release goes out. Of course, I don't mint the releases so that's easy for me to say; @davidhassell and others may have a different perspective. |
(2) has the disadvantage of creating gaps in the history -- any lines that were fixed by the job will show as hvng been edited by the CI, not whoever actually made the change (and the data will be wring, too, though if it's run regularly, not by much). On the other hand, maybe history isn't important for this kind of project. If there's something of concern with a part of the doc, does it matter exactly who last touched it or when they did? I think (3 i. ) is not great -- similar to commit hooks, it puts a burden on the author of the PR -- and depending on their skill set could be a barrier to entry. I'm not sure I fully understand (3 ii.) -- where / when would the fixing occur? Depending on the answer to that, it could introduce similar history issue. So I suggest (4) (which might be 3 ii, if I've misunderstood: (4) Summary: all changed, including auto-linting, are squashed as a single commit when merged into main Some of this is (hopefully) policy already, but I'm putting it here, as it's important to the process.
This could be a bit complicated by PRs from a third-party repo -- I'd have to think about that more, as asking folks to first merge into a temporary branch,and then we merge into main could be a bit challenging. |
I am nothing like an expert in git or github, more the opposite. Maybe that is why I do not fully see the same disadvantage alternative 2 as @ChrisBarker-NOAA does: |
yes, I'm pretty sure that's possible.
That's the problem -- for example:
And git blame shows:
All good. We know that lines 95 and 96 were touched by Daniel on 2024-01-24. Now the bot runs, and corrects whitespace, etc. Now git log will look like:
not too bad, you can always look at previous commits. But now git blame looks like this:
So we know when those lines were changed, but not by who. Anyway, not fatal -- in some cases with real code, you really want to know who made the changes, but for this kind of thing, maybe we don't care (and it is there in the history if you really want to dig for it) -- but that's my point. If we can find a way to get the autoformat changes to be squashed with the other changes as one merge, then it'll be a clearer history. maybe that's not important, though. |
TL;DR: I'd keep it simple for the merging procedures because that takes place a lot more often than forensics do, and I don't think we incur huge debts by having a bit of a dirty history. Note that we have a lot of commits that have very undescriptive messages as they are. I can unpack that a bit: Yes, in essence you can still find out who really authored each line by going back to the commit previous to the one made by the autoformatter and then you'd still get the line-by-line allocated to (probably) humans. In the end though I think the squashing or rebasing or whatever is probably down to the release procedures, because there you have similar concerns. One could e.g. rebase such that the autoformatter changes are always squashed into the previous commit. That'd be how I'd do it, personally. It really depends on how we do the branching and merging - currently after agreement essentially the work branch is merged into
In my view all of those are feasible but come at costs that outweigh the benefits. In the end, in the event that one really wants line-by-line forensics, it's gonna involve a bit of digging anyway, but that digging doesn't take place often and people who are expert in git can be asked for advise at that point. |
Hi all - I favor option 3.1 (test and fix by hand-ish [1]) because it increases the visibility of the style guide rather than hiding it behind an automatic process. I think awareness of the style guide/rules is worth a bit of extra effort. Also, that effort doesn't need to be provided by the original committer. The committer can allow CF repo maintainers to push to the PR branch. [1] By hand or by a CI job that runs after the failed test and adds a commit to the PR. |
After reading @ChrisBarker-NOAA's "squash" comment, I took a closer look and notice that we have a combination of PRs being merged and PRs being squashed onto the 'main' branch. While I think a clean commit history is important to work towards, I think we might want to develop some process around how to decide on what makes a clean commit for a given PR. Encouraging PR committer(s) to decide on and rebase to get a clean (minimal but informative) commit history would be my go to preference. But that adds yet another barrier. Though, again, I think it could be handled by CF repo maintainers. This should probably be a separate discussion. I'll try and start an issue on PR merge process and clean commit history later today. |
As per my review comment in #401 (review), it has been agreed to re-remove trailing whitespace (and any other non-visible characters and features) from the document (AsciiDoc format) files. I will do this later today.
That said, I've raised this before in #393 where the blitzing of all whitespce was previously completed, but it can easily get added to the document pages as PRs go in. For that reason, I propose we have a means to automate or at least regulate the removal of any trailing whitespace that gets added. This could be part of further linting automation, or standalone, though removal of trailing whitespace on all files is quite trivial in terms of applying operations to do it, so would be particularly easy to set up, maybe as a first step.
I would suggest the following, on these lines. Either we:
pre-commit
hooks to ensure all commits obey the linting standards we configure;I am happy to set something above towards the above.
If folk, esp. on the Info. Management Committee, could let me know if they agree with the proposal, that would be appreciated. In the meantime, I'll do a PR to remove the whitespace introduced after more recent PRs including #401.
The text was updated successfully, but these errors were encountered: