Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve preprocessing performance #10534

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 59 additions & 56 deletions Cabal/src/Distribution/Simple/PreProcess.hs
Original file line number Diff line number Diff line change
Expand Up @@ -292,63 +292,66 @@ preprocessFile
-- ^ fail on missing file
-> IO ()
preprocessFile mbWorkDir searchLoc buildLoc forSDist baseFile verbosity builtinSuffixes handlers failOnMissing = do
-- look for files in the various source dirs with this module name
-- and a file extension of a known preprocessor
psrcFiles <- findFileCwdWithExtension' mbWorkDir (map fst handlers) searchLoc baseFile
case psrcFiles of
-- no preprocessor file exists, look for an ordinary source file
-- just to make sure one actually exists at all for this module.

-- Note [Dodgy build dirs for preprocessors]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- By looking in the target/output build dir too, we allow
-- source files to appear magically in the target build dir without
-- any corresponding "real" source file. This lets custom Setup.hs
-- files generate source modules directly into the build dir without
-- the rest of the build system being aware of it (somewhat dodgy)
bsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes (searchLoc ++ [buildAsSrcLoc]) baseFile
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we try and find a regular Haskell file.

We also add buildAsSrcLoc] to the end of the list, since the locations are tried in-order. Most likely, we're going to be in the first directory we're searching in, so this should eliminate a lot of file lookups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a change in behaviour? Before, if you had a .hs and a .y file in the directory, would it prefer to regenerate the .hs from the .y file?

It seems that it would then prefer the generated .y file rather than the .hs file in the directory, because it searched the build directory first?

Could we have some tests to check this behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - what is the expected behavior of having multiple files with different extensions in directory? Rather, what should happen if we have a directory project like this:

src/Foo.hs
src/Foo.y

The old code would ignore src/Foo.hs in favor of src/Foo.y. The new code ignores src/Foo.y in favor of src/Foo.hs.

This is definitely a change in behavior, but it would require a project to be poorly structured - somehow having Foo.hs and Foo.y together, with the expectation that Foo.hs is never used.

case bsrcFiles of
-- found a non-processable file in one of the source dirs
Just _ -> do
pure ()
Nothing -> do
bsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes (buildAsSrcLoc : searchLoc) baseFile
case (bsrcFiles, failOnMissing) of
(Nothing, True) ->
dieWithException verbosity $
CantFindSourceForPreProcessFile $
"can't find source for "
++ getSymbolicPath baseFile
++ " in "
++ intercalate ", " (map getSymbolicPath searchLoc)
_ -> return ()
-- found a pre-processable file in one of the source dirs
Just (psrcLoc, psrcRelFile) -> do
let (srcStem, ext) = splitExtension $ getSymbolicPath psrcRelFile
psrcFile = psrcLoc </> psrcRelFile
pp =
fromMaybe
(error "Distribution.Simple.PreProcess: Just expected")
(lookup (Suffix $ safeTail ext) handlers)
-- Preprocessing files for 'sdist' is different from preprocessing
-- for 'build'. When preprocessing for sdist we preprocess to
-- avoid that the user has to have the preprocessors available.
-- ATM, we don't have a way to specify which files are to be
-- preprocessed and which not, so for sdist we only process
-- platform independent files and put them into the 'buildLoc'
-- (which we assume is set to the temp. directory that will become
-- the tarball).
-- TODO: eliminate sdist variant, just supply different handlers
when (not forSDist || forSDist && platformIndependent pp) $ do
-- look for existing pre-processed source file in the dest dir to
-- see if we really have to re-run the preprocessor.
ppsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes [buildAsSrcLoc] baseFile
recomp <- case ppsrcFiles of
Nothing -> return True
Just ppsrcFile ->
i psrcFile `moreRecentFile` i ppsrcFile
when recomp $ do
let destDir = i buildLoc </> takeDirectory srcStem
createDirectoryIfMissingVerbose verbosity True destDir
runPreProcessorWithHsBootHack
pp
(getSymbolicPath $ psrcLoc, getSymbolicPath $ psrcRelFile)
(getSymbolicPath $ buildLoc, srcStem <.> "hs")
-- look for files in the various source dirs with this module name
-- and a file extension of a known preprocessor
psrcFiles <- findFileCwdWithExtension' mbWorkDir (map fst handlers) searchLoc baseFile
case psrcFiles of
-- no preprocessor file exists, look for an ordinary source file
-- just to make sure one actually exists at all for this module.

-- Note [Dodgy build dirs for preprocessors]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- By looking in the target/output build dir too, we allow
-- source files to appear magically in the target build dir without
-- any corresponding "real" source file. This lets custom Setup.hs
-- files generate source modules directly into the build dir without
-- the rest of the build system being aware of it (somewhat dodgy)
Nothing ->
when failOnMissing $ do
dieWithException verbosity $
CantFindSourceForPreProcessFile $
"can't find source for "
++ getSymbolicPath baseFile
++ " in "
++ intercalate ", " (map getSymbolicPath searchLoc)
Just (psrcLoc, psrcRelFile) -> do
let (srcStem, ext) = splitExtension $ getSymbolicPath psrcRelFile
psrcFile = psrcLoc </> psrcRelFile
pp =
fromMaybe
(error "Distribution.Simple.PreProcess: Just expected")
(lookup (Suffix $ safeTail ext) handlers)
-- Preprocessing files for 'sdist' is different from preprocessing
-- for 'build'. When preprocessing for sdist we preprocess to
-- avoid that the user has to have the preprocessors available.
-- ATM, we don't have a way to specify which files are to be
-- preprocessed and which not, so for sdist we only process
-- platform independent files and put them into the 'buildLoc'
-- (which we assume is set to the temp. directory that will become
-- the tarball).
-- TODO: eliminate sdist variant, just supply different handlers
when (not forSDist || forSDist && platformIndependent pp) $ do
-- look for existing pre-processed source file in the dest dir to
-- see if we really have to re-run the preprocessor.
ppsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes [buildAsSrcLoc] baseFile
recomp <- case ppsrcFiles of
Nothing -> return True
Just ppsrcFile ->
i psrcFile `moreRecentFile` i ppsrcFile
when recomp $ do
let destDir = i buildLoc </> takeDirectory srcStem
createDirectoryIfMissingVerbose verbosity True destDir
runPreProcessorWithHsBootHack
pp
(getSymbolicPath $ psrcLoc, getSymbolicPath $ psrcRelFile)
(getSymbolicPath $ buildLoc, srcStem <.> "hs")

where
i = interpretSymbolicPath mbWorkDir -- See Note [Symbolic paths] in Distribution.Utils.Path
buildAsSrcLoc :: SymbolicPath Pkg (Dir Source)
Expand Down
Loading