ghc-dump
is a library, GHC plugin, and set of tools for recording and
analysing GHC's Core representation. The plugin is compatible with GHC 7.10
through 9.0, exporting a consistent (albeit somewhat lossy) representation
across these versions. The AST is encoded as CBOR, which is small and easy to
deserialise.
The GHC plugin GhcDump.Plugin
provides a Core-to-Core plugin which dumps a
representation of the Core AST to a file after every Core-to-Core pass. To use
it on a cabal
package add the following to the relevant stanza of your cabal
file (e.g. the library
stanza in the case of a library):
build-depends: ghc-dump-core
ghc-options: -fplugin GhcDump.Plugin
Now building the project will produce a number of .cbor
files in the
dist-newstyle
directory, one for each step of GHC's Core-to-Core pipeline.
$ cabal build
$ ls **/*.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0000.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0001.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0002.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0003.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0004.cbor
dist-newstyle/build/x86_64-linux/ghc-8.10.4/aeson-1.5.6.0/build/attoparsec-iso8601/src/Data/Attoparsec/Time/Internal.pass-0005.cbor
...
Here we see a pass-N.cbor
file was produced for each Core-to-Core pass.
We can produce a brief summary of these files using the summarize
mode of the ghc-dump
CLI tool:
Name Terms Types Coerc. Previous phase
Data/Attoparsec/Time/Internal.pass-0000.cbor
124 63 9 desugar
Data/Attoparsec/Time/Internal.pass-0001.cbor
138 59 38 Simplifier: Max iterations = 4
SimplMode {Phase = InitialPhase [Gentle],
inline,
rules,
eta-expand,
no case-of-case}
Data/Attoparsec/Time/Internal.pass-0002.cbor
138 59 38 Specialise:
Data/Attoparsec/Time/Internal.pass-0003.cbor
142 59 38 Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}):
Data/Attoparsec/Time/Internal.pass-0004.cbor
118 41 38 Simplifier: Max iterations = 4
SimplMode {Phase = 2 [main],
inline,
rules,
eta-expand,
case-of-case}
Data/Attoparsec/Time/Internal.pass-0005.cbor
112 32 38 Simplifier: Max iterations = 4
SimplMode {Phase = 1 [main],
inline,
rules,
eta-expand,
case-of-case}
Here we see for each dump file:
- the file name
- the size of the program measured in terms, types, and coercions
- the name of the phase that produced the program
One can then load this into ghci
for analysis,
$ ghci
GHCi, version 8.3.20170413: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/ben/.ghci
λ> import GhcDump.Repl as Dump
λ> mod <- readDump "Test.pass-0.cbor"
λ> pretty mod
module Main where
nsoln :: Int-> Int
{- Core Size{terms=98 types=66 cos=0 vbinds=0 jbinds=0} -}
nsoln =
λ nq →
let safe =
λ x d ds →
case ds of wild {
[] → GHC.Types.True
: q l →
GHC.Classes.&&
...
Alternatively, the ghc-dump
utility can be used to render the representation
in human-readable form. For instance, we can filter the dump to include only
top-level binders containing main
in its name,
$ ghc-dump show --filter='.*main.*' Test.pass-0.cbor
You can conveniently summarize the top-level bindings of the program,
$ ghc-dump list-bindings --sort=terms Test.pass-0.cbor
Name Terms Types Coerc. Type
nsoln 98 66 0 Int-> Int
main 30 28 0 IO ()
$trModule 5 0 0 Module
main 2 1 0 IO ()
...