haskellfoundation · hsyl20 · Jan 2, 2024 · Mar 15, 2024 · doyougnu · Jan 3, 2024
diff --git a/src/Measurement_Observation/haskell_model.rst b/src/Measurement_Observation/haskell_model.rst
@@ -0,0 +1,127 @@
+
+Haskell Compilation and Execution Model
+=======================================
+
+
+GHC doesn't use a VM but compiles to native code (when a native code generator
+is used).
+So we can use tools to profile native code execution: e.g. perf.
+However the native code we execute doesn't look exactly like the one produced by
+imperative languages (C, C++, Rust, etc.) and this confuses the tools which make
+some assumptions.
+For example:
+
+1. we don't have clear procedure boundaries with call/return instruction pairs
+   (we use tail calls, i.e. jump instructions).
+
+2. we don't use the so-called C stack in the usual way. I.e. we don't use usual
+   stack registers (e.g. rsp/rbp on x86-64) as stack registers.
+
+3. GHC uses its own calling convention.
+
+In addition, lazy evaluation makes control-flow and memory usage difficult to
+understand.
-In addition, lazy evaluation makes control-flow and memory usage difficult to
-understand.
+And finally, lazy evaluation can make control-flow and memory usage difficult to
+understand.
-In addition, lazy evaluation makes control-flow and memory usage difficult to
-understand.
+And finally, lazy evaluation can make control-flow and memory usage difficult to
+understand.
+
+In this chapter we describe the compilation pipeline and the execution model of GHC Haskell.
+For each stage of the pipeline we list what can be measured at runtime.
-For each stage of the pipeline we list what can be measured at runtime.
+We'll begin from the familiar world of the Haskell language and descend into the depths of the machine. Then, for each stage or `intermediate representation<https://en.wikipedia.org/wiki/Intermediate_representation>`_ in the pipeline we'll list what can be measured at runtime.
-For each stage of the pipeline we list what can be measured at runtime.
+We'll begin from the familiar world of the Haskell language and descend into the depths of the machine. Then, for each stage or `intermediate representation<https://en.wikipedia.org/wiki/Intermediate_representation>`_ in the pipeline we'll list what can be measured at runtime.
+
+
+Compilation pipeline overview
+-----------------------------
+
+GHC's compilation pipeline is roughly composed of four stages and intermediate representations. They are:
+
+Haskell: the functional language we love with its bazillion of extensions.
+
+Core: a simple and explicitly typed functional language. Types are basically
+Haskell types. Type-class dictionaries are passed explicitly as records,
+type-applications are used everywhere needed, coercions (proofs that we can
+convert a type into another) are first class values, etc.
-convert a type into another) are first class values, etc.
+convert a type into another) are first class values, etc.
+
+Core is a good intermediate representation to observe the effect of polymorphism and optimization passes on your code. Core should also be used as a first stop on the journey into the pipeline because it is the intermediate representation that will be closest to the familiar Haskell code. See the :ref:`Reading Core  <Reading Core>` chapter for more.
-convert a type into another) are first class values, etc.
+convert a type into another) are first class values, etc.
+
+Core is a good intermediate representation to observe the effect of polymorphism and optimization passes on your code. Core should also be used as a first stop on the journey into the pipeline because it is the intermediate representation that will be closest to the familiar Haskell code. See the :ref:`Reading Core  <Reading Core>` chapter for more.
+
+STG: a functional language closer to the execution model. It's still typed but
-STG: a functional language closer to the execution model. It's still typed but
+STG: a functional language that describes programs which run an an abstract machine called the Spineless Tagless G-machine. STG is closer to the execution model. It's still typed but
-STG: a functional language closer to the execution model. It's still typed but
+STG: a functional language that describes programs which run an an abstract machine called the Spineless Tagless G-machine. STG is closer to the execution model. It's still typed but
+with primitive types. E.g. it tracks if a value is a heap object or an
+:term:`unboxed` word but not the Haskell
+type of the heap object. Complex forms are lowered to simpler ones (e.g. unboxed
+sums are lowered into unboxed tuples); this is called unarisation.
-sums are lowered into unboxed tuples); this is called unarisation.
+sums are lowered into unboxed tuples); this lowering is called the unarisation optimization.
-sums are lowered into unboxed tuples); this is called unarisation.
+sums are lowered into unboxed tuples); this lowering is called the unarisation optimization.
+
+Additionally every heap allocation is made explicit with a let-binding. For
+example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``.
+This is called `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_.
+During code generation from STG we know that all functions are applied to
+"simple" arguments only (variables, constants, etc.).
-Additionally every heap allocation is made explicit with a let-binding. For
-example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``.
-This is called `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_.
-During code generation from STG we know that all functions are applied to
-"simple" arguments only (variables, constants, etc.).
+STG comports to `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_ by upholding two invariants. First, every heap allocation is made explicit with a let-binding; for
+example instead of ``foo (MkBar x y)`` in STG we have ``let b = MkBar x y in foo b``. Second, all functions are applied to
+"simple" arguments only (variables, constants, etc.). 
+
+These features make STG a good intermediate representation to read to understand exactly where memory is being allocated. By the time the code is transformed into STG all polymorphism and type level features have been compiled away, simplifying the program. Because STG is in a normal form, one can be confident that every let observed in the STG representation of their program is doing allocation; which is not true for Haskell or Core. See the :ref:`Reading STG <Reading STG>` chapter for more.
-Additionally every heap allocation is made explicit with a let-binding. For
-example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``.
-This is called `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_.
-During code generation from STG we know that all functions are applied to
-"simple" arguments only (variables, constants, etc.).
+STG comports to `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_ by upholding two invariants. First, every heap allocation is made explicit with a let-binding; for
+example instead of ``foo (MkBar x y)`` in STG we have ``let b = MkBar x y in foo b``. Second, all functions are applied to
+"simple" arguments only (variables, constants, etc.). 
+
+These features make STG a good intermediate representation to read to understand exactly where memory is being allocated. By the time the code is transformed into STG all polymorphism and type level features have been compiled away, simplifying the program. Because STG is in a normal form, one can be confident that every let observed in the STG representation of their program is doing allocation; which is not true for Haskell or Core. See the :ref:`Reading STG <Reading STG>` chapter for more.
+
+Following the approach pioneered with "super-combinators", every top-level STG
+binding is then compiled into imperative code. The idea is that executing this
+imperative code has the same result as interpreting the functional code. For
+example, ``let b = MkBar x y in foo b`` compiles to:
+
+.. code-block:: C
+
+  b <- allocate heap object for `MkBar x y`
+  evaluate foo
+  apply `foo` to `b`
+
+STG is the last stage in the pipeline that is shared between all of GHC's backends; different backends use different imperative representations for their exact needs. For example, the interpreter
+uses ByteCode, the JavaScript backend uses a subset of JavaScript, all other
+backends use Cmm (pronounced *C minus minus*).
+
+Cmm: an imperative language that looks like LLVM IR. It supports expressions,
-Cmm: an imperative language that looks like LLVM IR. It supports expressions,
+Cmm: an imperative language that looks like `LLVM IR<https://llvm.org/docs/LangRef.html#introduction>`_ and is the input to GHC's code generator. It supports expressions,
-Cmm: an imperative language that looks like LLVM IR. It supports expressions,
+Cmm: an imperative language that looks like `LLVM IR<https://llvm.org/docs/LangRef.html#introduction>`_ and is the input to GHC's code generator. It supports expressions,
+statements, and it abstracts over machine primops, machine registers, stack
+usage, and calling conventions. A pass performs register assignment and stack
-usage, and calling conventions. A pass performs register assignment and stack
+usage, and calling conventions. An analysis pass performs register assignment and stack
-usage, and calling conventions. A pass performs register assignment and stack
+usage, and calling conventions. An analysis pass performs register assignment and stack
+layout for the target architecture. Then assembly code (textual) is generated
+for the target architecture and an external assembler program generates machine
+code (binary) for it. Finally an external linker is used to transform the
+resulting code objects into a single executable or library.
+
+Executables are linked against a runtime system (RTS) that provides primitives
+to manage:
+- memory: allocation, garbage collection...
+- scheduling: thread scheduling, blocking queues...
+- IOs: primitives to interact with the operating system
+- dynamic code loading: loading and unloading code objects at runtime.
+
+Note that the RTS itself comes in different flavors. For example, a threaded RTS which uses multiple OS threads to execute Haskell code or not.
+
+All these compilation stages and the RTS provide knobs to tweak the generated
+code and the behavior of the runtime system. In particular, some probes can be
+optionally inserted in the generated code at various stages to produce different
+profiling information at runtime. We refer interested those interested in these tweaks to the :userGuide:`User Guide <flags.html>`.
+
+
+Execution model overview
+------------------------
+
+The heap of a Haskell program contain objects that reflect its current execution
+state: suspended computations (thunks), partially applied functions (PAP),
+values (datacon and their payload), threads (TSO) and their stack, etc.
+When the computer executes the compiled code for a top-level STG binding
+(starting from ``main``), it asks for more objects to be allocated in the heap,
+for some thunks to be reduced to another heap object, or for existing objects to
+be used as arguments for function applications.
-The heap of a Haskell program contain objects that reflect its current execution
-state: suspended computations (thunks), partially applied functions (PAP),
-values (datacon and their payload), threads (TSO) and their stack, etc.
-When the computer executes the compiled code for a top-level STG binding
-(starting from ``main``), it asks for more objects to be allocated in the heap,
-for some thunks to be reduced to another heap object, or for existing objects to
-be used as arguments for function applications.
+The heap of a Haskell program contain objects that reflect its current execution
+state: suspended computations (:term:`thunk`), partially applied functions (:term:`PAP`),
+values (data constructors and their payload), threads (:term:`TSO`) and their stack, etc.
+When the computer executes the compiled code for a :term:`top-level` STG binding
+(starting from ``main``), it asks for more objects to be allocated in the heap,
+for some thunks to be reduced to another heap object, or for existing objects to
+be used as arguments for function applications.
-The heap of a Haskell program contain objects that reflect its current execution
-state: suspended computations (thunks), partially applied functions (PAP),
-values (datacon and their payload), threads (TSO) and their stack, etc.
-When the computer executes the compiled code for a top-level STG binding
-(starting from ``main``), it asks for more objects to be allocated in the heap,
-for some thunks to be reduced to another heap object, or for existing objects to
-be used as arguments for function applications.
+The heap of a Haskell program contain objects that reflect its current execution
+state: suspended computations (:term:`thunk`), partially applied functions (:term:`PAP`),
+values (data constructors and their payload), threads (:term:`TSO`) and their stack, etc.
+When the computer executes the compiled code for a :term:`top-level` STG binding
+(starting from ``main``), it asks for more objects to be allocated in the heap,
+for some thunks to be reduced to another heap object, or for existing objects to
+be used as arguments for function applications.
+
+The garbage collector is responsible for freeing space in the heap. It runs when
+there is not enough space left or depending on other heuristics. GHC provides
+several :userGuide:`knobs <runtime_control.html#rts-options-to-control-the-garbage-collector>` to configure the garbage collector strategies to use and to tweak
+their properties.
+
+The garbage collector used has an impact on profiling. For an extreme example,
+if the heap size is configured to be large enough than your program never has to
+perfom garbage collection, you'll find that you spend 0% time doing garbage
+collection and 100% executing your program ("mutator" time): all garbage
+collection occurs at once implicitly when your program exits. On the other hand,
+if you configure the heap to be very small, most of the time can be spent doing
+garbage collection even if you haven't changed anything in your program.
+Thus, **you must be very aware of the RTS options you use when profiling**.
+
+Similarly, some profiling options have an impact on the size of the heap
+objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes
+instead of 16 bytes without profiling. Because of this, you may find that your
-objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes
-instead of 16 bytes without profiling. Because of this, you may find that your
+objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes without profiling, however with profiling this same object will use 16 bytes. Because of this, you may find that your
-objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes
-instead of 16 bytes without profiling. Because of this, you may find that your
+objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes without profiling, however with profiling this same object will use 16 bytes. Because of this, you may find that your
+code with this profiling enabled triggers more garbage collections than your
+code without this profiling enabled. **You have to be aware of the compilation
+options you use when you make some runtime measurements.**
-code without this profiling enabled. **You have to be aware of the compilation
-options you use when you make some runtime measurements.**
+ Therefore **Be careful. You have to be aware of the compilation
+options you use when you make some runtime measurements.**
-code without this profiling enabled. **You have to be aware of the compilation
-options you use when you make some runtime measurements.**
+ Therefore **Be careful. You have to be aware of the compilation
+options you use when you make some runtime measurements.**
+
+Consequences on Profiling
+-------------------------
+
+As a consequence of the Haskell compilation pipeline and of the Haskell execution model
+we can measure many different things at different levels.
diff --git a/src/Measurement_Observation/index.rst b/src/Measurement_Observation/index.rst
@@ -6,6 +6,7 @@ Measurement, Profiling, and Observation
    :name: Heap_Ghc
 
    the_recipe
+   haskell_model
    Heap_Ghc/index
    Heap_Third/index
    Measurement_Ghc/index