Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section about Haskell execution model #95

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions src/Measurement_Observation/haskell_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@

Haskell Compilation and Execution Model
=======================================


GHC doesn't use a VM but compiles to native code (when a native code generator
is used).
So we can use tools to profile native code execution: e.g. perf.
However the native code we execute doesn't look exactly like the one produced by
imperative languages (C, C++, Rust, etc.) and this confuses the tools which make
some assumptions.
For example:

1. we don't have clear procedure boundaries with call/return instruction pairs
(we use tail calls, i.e. jump instructions).

2. we don't use the so-called C stack in the usual way. I.e. we don't use usual
stack registers (e.g. rsp/rbp on x86-64) as stack registers.

3. GHC uses its own calling convention.

In addition, lazy evaluation makes control-flow and memory usage difficult to
understand.
Comment on lines +22 to +23
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition, lazy evaluation makes control-flow and memory usage difficult to
understand.
And finally, lazy evaluation can make control-flow and memory usage difficult to
understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is up to you. I just wanted to soften the claim because it makes it seem like this is an inherent feature of lazy evaluation and I'm not sure I agree with that.


In this chapter we describe the compilation pipeline and the execution model of GHC Haskell.
For each stage of the pipeline we list what can be measured at runtime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each stage of the pipeline we list what can be measured at runtime.
We'll begin from the familiar world of the Haskell language and descend into the depths of the machine. Then, for each stage or `intermediate representation<https://en.wikipedia.org/wiki/Intermediate_representation>`_ in the pipeline we'll list what can be measured at runtime.



Compilation pipeline overview
-----------------------------

GHC's compilation pipeline is roughly composed of four stages and intermediate representations. They are:

Haskell: the functional language we love with its bazillion of extensions.

Core: a simple and explicitly typed functional language. Types are basically
Haskell types. Type-class dictionaries are passed explicitly as records,
type-applications are used everywhere needed, coercions (proofs that we can
convert a type into another) are first class values, etc.
Copy link
Collaborator

@doyougnu doyougnu Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
convert a type into another) are first class values, etc.
convert a type into another) are first class values, etc.
Core is a good intermediate representation to observe the effect of polymorphism and optimization passes on your code. Core should also be used as a first stop on the journey into the pipeline because it is the intermediate representation that will be closest to the familiar Haskell code. See the :ref:`Reading Core <Reading Core>` chapter for more.


STG: a functional language closer to the execution model. It's still typed but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STG: a functional language closer to the execution model. It's still typed but
STG: a functional language that describes programs which run an an abstract machine called the Spineless Tagless G-machine. STG is closer to the execution model. It's still typed but

with primitive types. E.g. it tracks if a value is a heap object or an
:term:`unboxed` word but not the Haskell
type of the heap object. Complex forms are lowered to simpler ones (e.g. unboxed
sums are lowered into unboxed tuples); this is called unarisation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sums are lowered into unboxed tuples); this is called unarisation.
sums are lowered into unboxed tuples); this lowering is called the unarisation optimization.


Additionally every heap allocation is made explicit with a let-binding. For
example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``.
This is called `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_.
During code generation from STG we know that all functions are applied to
"simple" arguments only (variables, constants, etc.).
Comment on lines +47 to +51
Copy link
Collaborator

@doyougnu doyougnu Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Additionally every heap allocation is made explicit with a let-binding. For
example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``.
This is called `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_.
During code generation from STG we know that all functions are applied to
"simple" arguments only (variables, constants, etc.).
STG comports to `A-normal form<https://en.wikipedia.org/wiki/A-normal_form>`_ by upholding two invariants. First, every heap allocation is made explicit with a let-binding; for
example instead of ``foo (MkBar x y)`` in STG we have ``let b = MkBar x y in foo b``. Second, all functions are applied to
"simple" arguments only (variables, constants, etc.).
These features make STG a good intermediate representation to read to understand exactly where memory is being allocated. By the time the code is transformed into STG all polymorphism and type level features have been compiled away, simplifying the program. Because STG is in a normal form, one can be confident that every let observed in the STG representation of their program is doing allocation; which is not true for Haskell or Core. See the :ref:`Reading STG <Reading STG>` chapter for more.


Following the approach pioneered with "super-combinators", every top-level STG
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reference for super combinators? We can either add an entry into the glossary, a footnote or a citation.

binding is then compiled into imperative code. The idea is that executing this
imperative code has the same result as interpreting the functional code. For
example, ``let b = MkBar x y in foo b`` compiles to:

.. code-block:: C

b <- allocate heap object for `MkBar x y`
evaluate foo
apply `foo` to `b`

STG is the last stage in the pipeline that is shared between all of GHC's backends; different backends use different imperative representations for their exact needs. For example, the interpreter
uses ByteCode, the JavaScript backend uses a subset of JavaScript, all other
backends use Cmm (pronounced *C minus minus*).

Cmm: an imperative language that looks like LLVM IR. It supports expressions,
Copy link
Collaborator

@doyougnu doyougnu Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cmm: an imperative language that looks like LLVM IR. It supports expressions,
Cmm: an imperative language that looks like `LLVM IR<https://llvm.org/docs/LangRef.html#introduction>`_ and is the input to GHC's code generator. It supports expressions,

statements, and it abstracts over machine primops, machine registers, stack
usage, and calling conventions. A pass performs register assignment and stack
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
usage, and calling conventions. A pass performs register assignment and stack
usage, and calling conventions. An analysis pass performs register assignment and stack

layout for the target architecture. Then assembly code (textual) is generated
for the target architecture and an external assembler program generates machine
code (binary) for it. Finally an external linker is used to transform the
resulting code objects into a single executable or library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing a paragraph that states:

  • what information can I get by reading Cmm that I cannot get by reading STG and Core?
  • and then a link:
`:ref:`Reading Cmm <Reading Cmm>` 

to the reading cmm chapter.


Executables are linked against a runtime system (RTS) that provides primitives
to manage:
- memory: allocation, garbage collection...
- scheduling: thread scheduling, blocking queues...
- IOs: primitives to interact with the operating system
- dynamic code loading: loading and unloading code objects at runtime.

Note that the RTS itself comes in different flavors. For example, a threaded RTS which uses multiple OS threads to execute Haskell code or not.

All these compilation stages and the RTS provide knobs to tweak the generated
code and the behavior of the runtime system. In particular, some probes can be
optionally inserted in the generated code at various stages to produce different
profiling information at runtime. We refer interested those interested in these tweaks to the :userGuide:`User Guide <flags.html>`.


Execution model overview
------------------------

The heap of a Haskell program contain objects that reflect its current execution
state: suspended computations (thunks), partially applied functions (PAP),
values (datacon and their payload), threads (TSO) and their stack, etc.
When the computer executes the compiled code for a top-level STG binding
(starting from ``main``), it asks for more objects to be allocated in the heap,
for some thunks to be reduced to another heap object, or for existing objects to
be used as arguments for function applications.
Comment on lines +94 to +100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The heap of a Haskell program contain objects that reflect its current execution
state: suspended computations (thunks), partially applied functions (PAP),
values (datacon and their payload), threads (TSO) and their stack, etc.
When the computer executes the compiled code for a top-level STG binding
(starting from ``main``), it asks for more objects to be allocated in the heap,
for some thunks to be reduced to another heap object, or for existing objects to
be used as arguments for function applications.
The heap of a Haskell program contain objects that reflect its current execution
state: suspended computations (:term:`thunk`), partially applied functions (:term:`PAP`),
values (data constructors and their payload), threads (:term:`TSO`) and their stack, etc.
When the computer executes the compiled code for a :term:`top-level` STG binding
(starting from ``main``), it asks for more objects to be allocated in the heap,
for some thunks to be reduced to another heap object, or for existing objects to
be used as arguments for function applications.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last few sentences also makes it sound as if the computer is executing STG and not assembly. I think its probably ok though.


The garbage collector is responsible for freeing space in the heap. It runs when
there is not enough space left or depending on other heuristics. GHC provides
several :userGuide:`knobs <runtime_control.html#rts-options-to-control-the-garbage-collector>` to configure the garbage collector strategies to use and to tweak
their properties.

The garbage collector used has an impact on profiling. For an extreme example,
if the heap size is configured to be large enough than your program never has to
perfom garbage collection, you'll find that you spend 0% time doing garbage
collection and 100% executing your program ("mutator" time): all garbage
collection occurs at once implicitly when your program exits. On the other hand,
if you configure the heap to be very small, most of the time can be spent doing
garbage collection even if you haven't changed anything in your program.
Thus, **you must be very aware of the RTS options you use when profiling**.

Similarly, some profiling options have an impact on the size of the heap
objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes
instead of 16 bytes without profiling. Because of this, you may find that your
Comment on lines +117 to +118
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes
instead of 16 bytes without profiling. Because of this, you may find that your
objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes without profiling, however with profiling this same object will use 16 bytes. Because of this, you may find that your

code with this profiling enabled triggers more garbage collections than your
code without this profiling enabled. **You have to be aware of the compilation
options you use when you make some runtime measurements.**
Comment on lines +120 to +121
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
code without this profiling enabled. **You have to be aware of the compilation
options you use when you make some runtime measurements.**
Therefore **Be careful. You have to be aware of the compilation
options you use when you make some runtime measurements.**


Consequences on Profiling
-------------------------

As a consequence of the Haskell compilation pipeline and of the Haskell execution model
we can measure many different things at different levels.
1 change: 1 addition & 0 deletions src/Measurement_Observation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Measurement, Profiling, and Observation
:name: Heap_Ghc

the_recipe
haskell_model
Heap_Ghc/index
Heap_Third/index
Measurement_Ghc/index
Expand Down
Loading