-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't glob if glob ain't glob #10502
Conversation
Just to be clear, you'll have to use cabal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Fourmolu seems unhappy with it.
Of course! This patch is meant as a quick proof-of-concept to show perf improvements. We'll likely use a patched |
-- TODO: This is kind of confusing - I feel like someone writing that they | ||
-- expect a file `foo/c.html` in `extra-source-files` should probably | ||
-- get a Problem if the match in question is `foo/c.html/blah` ? | ||
-- , testCase "literal no match on prefix" $ | ||
-- testMatches "foo/c.html" (Left "foo/c.html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open question here still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd, yeah. I'd support making it an error, although it would be really nice if we had a crater
-like tool we could use to check the impact of a change like this.
=> [Either FilePath Glob] -- data-files globs. | ||
-> [Either FilePath Glob] -- extra-source-files globs. | ||
-> [Either FilePath Glob] -- extra-doc-files globs. | ||
-> [Either FilePath Glob] -- extra-files globs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Either
blindness here is a bit of a smell imo, probably better for Glob
to have a OopsAllLiterals
constructor?
-> CheckM m () | ||
checkMissingDocs dgs esgs edgs efgs = do | ||
extraDocSupport <- (>= CabalSpecV1_18) <$> asksCM ccSpecVersion | ||
|
||
mpackageOps <- asksCM $ ciPackageOps . ccInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels pretty awkward to me - liftInt
doesn't give a way to use two interfaces, so I have to smuggle this one in here. Could push the liftInt
calls down to using the appropriate interfaces more locally.
parseFileGlob :: CabalSpecVersion -> FilePath -> Either GlobSyntaxError (Either FilePath Glob) | ||
parseFileGlob _version filepath | ||
| all (`notElem` filepath) "*{}," = | ||
Right (Left filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core change in this PR. If we don't actually use any glob-like characters, it's not a glob, so we can skip the slow glob treatment.
@@ -1019,7 +1026,23 @@ checkMissingDocs dgs esgs edgs efgs = do | |||
-- 2. Realise Globs. | |||
let realGlob t = | |||
concatMap globMatches | |||
<$> mapM (runDirFileGlobM ops "") t | |||
<$> mapM | |||
( \efpg -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( \efpg -> | |
( \pathOrGlob -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. efpg
is a bad variable name though, fix that one if you would :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked the PR as ready for review. I have two main questions at this point:
- What's the recommendation for a less invasive change to
Cabal
here, so we can potentially get this back-ported to earlier versions? - There are a few TODOs left in PR - would appreciate some guidance on those.
- The
Either
profileration here feels a little gross - not generally a fan of primitive blindness like that. Would we prefer to see something likedata PathOrGlob = PlainFilepath Filepath | ActuallyGlob Glob
? Happy to take suggestions on naming here.
pure $ | ||
if exist | ||
then [GlobMatch filepath] | ||
else [] -- TODO: probably should signal failure here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, we'll have already reported a problem if the filepath literal glob didn't match something previously. This check is trying to see if there are any other notable files present in these globs. So this triggering an error condition is unlikely: we'd need to have, like, extra-source-files: README.md
and then README.md
not being present in the directory.
I think this is redundant, so it's probably fine to have []
here.
A future improvement will remove the realGlob
function from this file and accept the [GlobResult FilePath]
directly as arguments, removing the redundant call to runDirFileGlobM
.
I'm going to close in favor of #10518 which should be back-port-able and is a much smaller change actually. Should let me work on the "avoid globbing twice" problem separately |
This PR modifies the behavior around globs, such that literal filepaths (ie a string that does not contain a glob special character
"{,}*"
) skip the glob logic and are treated as bare filepaths.Why?
I've identified that the glob logic is quite slow. Locally, it takes about 5-10ms to run through the glob checker on literal filepaths. Meanwhile,
doesFileExist
is measured in ~200ns or so. This wouldn't be a huge deal on it's own, but we have about 1,800 lines ofextra-source-files
, and the time spent inExpanding glob
on our codebase is about 35 seconds, the vast majority of the 45-50 second lag time betweencabal repl --repl-no-load
and an empty GHCi prompt being available.On our codebase, this patch reduces the initial boot time to just 14 seconds.
This patch followed a "type directed" approach, and this resulted in several breaking changes to the API. I can rework this to be a less breaking change, but it felt like the easiest way to ensure I covered all points of behavior.
This PR addresses the primary slowdown experienced in #10495.
Template Α: This PR modifies behaviour or interface
Include the following checklist in your PR: