Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Glob if Glob Ain't Glob 2: The Globbening #10518

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

parsonsmatt
Copy link
Collaborator

This PR is a less invasive variant of #10502 which moves the special treatment of glob patterns into runDirFileGlob directly. Locally, I'm observing a comparable performance improvement to that PR: #10502 takes about 150ms, and this one takes about 250ms.

Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

@parsonsmatt parsonsmatt marked this pull request as draft November 4, 2024 16:59
@parsonsmatt parsonsmatt mentioned this pull request Nov 4, 2024
5 tasks
Copy link
Collaborator

@alt-romes alt-romes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Minor points:

  • Please squash the "throwaway" commits into the base one
  • I wonder if we can try to be even smarter, or whether GlobFile [Literal filename] is the only relevant shortcut we can take

@alt-romes
Copy link
Collaborator

The error message in one of the glob tests seems to regress a bit:

-Warning: [glob-missing-dir] In 'extra-source-files': the pattern '/home/user/file' attempts to match files in the directory '/home/user', but there is no directory by that name.
+Warning: [no-glob-match] In 'extra-source-files': the pattern '/home/user/file' does not match any files.

My intuition is that go in the glob matching function would traverse glob piece by piece, so if /home/user is not a valid directory it gets reported as such before file being a non-existent file.

I think this is fine, since the speedup is likely more relevant than reporting non-existent directory root rather vs invalid filepath.

Do check whether the test itself is meant to test specifically for the behaviour which we're losing, so you can update it accordingly and say the behaviour is now something else (something something, "despite /home/user not existing, we report the error with the full path because of the early "fast" check..."), if that makes sense.

@parsonsmatt
Copy link
Collaborator Author

[glob-missing-dir] In 'extra-source-files': the pattern '/home/user/file' attempts to match files in the directory '/home/user', but there is no directory by that name.

I actually think this is a pretty confusing message: /home/user/file is not a glob pattern at all, it's a literal filepath. I can see there being some value to reporting "This file was not found, additionally, the entire directory isn't present," but I think we'd want to construct this error message after a file isn't found - otherwise we're going to be doing number of directories in segment redundant lookups on the common path (file exists, is a literal).

I'll review the tests, but I'm unsure what's actually failing. ./validate.sh passes for me locally.

@parsonsmatt
Copy link
Collaborator Author

I wonder if we can try to be even smarter, or whether GlobFile [Literal filename] is the only relevant shortcut we can take

There's a prior check that separates out the GlobDirectory [Literal segment] globs already, so if we get to a GlobFile [Literal filename], then the whole chain would have been GlobDirectory [Literal seg0] (GlobDirectory [Literal seg1] (... (GlobFile [Literal filename])...).

It is slightly faster to bypass the entire parsing step and joining of prefixes like that (as seen in #10502 ), but doing so in a type-safe way requires changing the API. I don't think the marginal performance impact there is worth breaking it.

Do check whether the test itself is meant to test specifically for the behaviour which we're losing, so you can update it accordingly and say the behaviour is now something else (something something, "despite /home/user not existing, we report the error with the full path because of the early "fast" check..."), if that makes sense.

I looked into the git blame and found this PR which introduced it: #8250 with related PR #8248. It seems like the intent of the test was just to capture the behavior so that the check rewrite would have identical behavior to to what was previously done. I don't think the test was intended to capture any particular error message.

@parsonsmatt
Copy link
Collaborator Author

RE squashing commits: I'm not great at git, any chance y'all can do a squash and merge on your side?

@ulysses4ever
Copy link
Collaborator

@parsonsmatt yes, squashing is easy on our side: we should make sure that the merge label we use is "squash+merge". Moreover, it's the author's job to pick the label, normally.

Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Cabal/src/Distribution/Simple/Glob.hs Show resolved Hide resolved
@alt-romes
Copy link
Collaborator

Very good @parsonsmatt. Ready to land.

@ulysses4ever ulysses4ever added the squash+merge me Tell Mergify Bot to squash-merge label Nov 7, 2024
@parsonsmatt
Copy link
Collaborator Author

@parsonsmatt yes, squashing is easy on our side: we should make sure that the merge label we use is "squash+merge". Moreover, it's the author's job to pick the label, normally.

Thanks! I'm unable to add labels to the PR though.

@mergify mergify bot added the ready and waiting Mergify is waiting out the cooldown period label Nov 7, 2024
@ulysses4ever
Copy link
Collaborator

@parsonsmatt I just added respective permissions to your account I hope. But I also applied the label to this PR.

@mergify mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days ready and waiting Mergify is waiting out the cooldown period squash+merge me Tell Mergify Bot to squash-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants