Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thunks tutorial #824

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Thunks tutorial #824

wants to merge 1 commit into from

Conversation

infinisil
Copy link
Member

Almost copied verbatim from https://nixos.wiki/wiki/Nix_Evaluation_Performance which I wrote some time ago.

This needs a bit of work, but I think it's not bad as is either. Feedback appreciated!

Almost copied verbatim from https://nixos.wiki/wiki/Nix_Evaluation_Performance
which I wrote some time ago
Copy link
Contributor

github-actions bot commented Dec 4, 2023

@fricklerhandwerk
Copy link
Collaborator

fricklerhandwerk commented Dec 4, 2023

Looks like concepts rather than tutorial, but cool stuff! Seems it only a bit of polish.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have this as more official documentation, even if not in the Nix manual.

Regarding the text, I have a problem in general with the phrasing of "allocating a thunk".
You allocate memory, and in that memory you put a state called a thunk.
If you were to put an actual value there, you would still allocate the same memory.
Thunks and values are necessarily of the same size, so allocating either one is really the same operation as the other.
Furthermore, you may allocate on the stack, and we actually do this for some values/thunks. This is super cheap (when permissible, ie without exacerbating the risk of stack overflows, and since 2.20 we'll stop doing allocations of arbitrary size). So what we really care about is heap allocations.

A more appropriate classification would be

  • heap objects vs stack objects
  • values vs thunks, but both of them being states of objects (unfortunately not the exact terminology used in the implementation though, where any nThunk can be in a Value struct)

I think we should care mostly about heap objects, many of which happen to be thunks at first. Conversely the thunks we care about tend to be on the heap, because those allocations are more expensive, and they may stick around for a long time.

It is only evaluated once needed.
It consists of two parts:
- The expression that the value should be evaluated from
- The variables the expression has access to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix also has tApp thunks, which are functions that haven't really been applied yet.
It's used in cases where Nix knows which functions and arguments need to be paired up, but haven't been demanded yet. Example: map f [e1 e2], tApp is used for f e1 and f e2 thunks, because for the primop there's no expression that represents these applications.
Other thunks may exist, but either way, and partly because such details may not be permanent, I believe the representation of thunks is not actually that relevant.
They're just delayed computations.

What may be relevant is that call by need is implemented by mutating an object in memory, changing it from a representation of a delayed computation, to a representation of a value that is in a weak head normal form.


It is very easy to introduce a lot of thunks in Nix code, which can have negative consequences:

- Every new thunk requires heap memory allocations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is 100% true, but Values on the stack are cheap anyway and can not be allowed to be referenced by any expression whose lifetime extends beyond that of the stack frame, making their use somewhat limited (although we have plenty on-stack values at various points, I must say).

Also allocations for let should be cheaper than other allocations.


- Every new thunk requires heap memory allocations.
- A thunk prevents the evaluation garbage collector from collecting any variables it needs,
causing not only the memory of the thunk itself to be kept alive, but also all its references.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such references are often referred to as a closure, and the term applies at least at a conceptual level, where we can say that the free variables are closed over.

Nix is perhaps simplistic in how it represents closures: a reference to a singly linked list is retained, which has all scopes all the way to the top of the file, even if no references are made to many of the variables in those scope layers. (Env is a what I would call a layer here)

- Every new thunk requires heap memory allocations.
- A thunk prevents the evaluation garbage collector from collecting any variables it needs,
causing not only the memory of the thunk itself to be kept alive, but also all its references.
- Too deeply nested thunks can lead to stack overflows when evaluated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact: if Nix changes its forceValue from an if (isThunk) to a while (isThunk) it could do a bunch of tail recursion, but traces might be worse or more expensive.

May not solve the problem for the mutual nesting of thunks and e.g. attrsets, so this stands. Probably for other patterns as well.

causing not only the memory of the thunk itself to be kept alive, but also all its references.
- Too deeply nested thunks can lead to stack overflows when evaluated.

Of course, thunks are essential to Nix, so it's not possible to avoid them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Of course, thunks are essential to Nix, so it's not possible to avoid them.
Thunks are essential to the implementation Nix, or any lazy functional language, so it's not possible to avoid them.

Less subjective and more informative respectively.


- `let ... in` expressions attempt to create a thunk for each variable
- `{ ... }` (attribute set) expressions attempt to create a thunk for each attribute
- `[ ... ]` (list) expressions attempt to create a thunk for each element
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `[ ... ]` (list) expressions attempt to create a thunk for each element
- `[ ... ]` (list) literals behave similarly to attribute values

- `let ... in` expressions attempt to create a thunk for each variable
- `{ ... }` (attribute set) expressions attempt to create a thunk for each attribute
- `[ ... ]` (list) expressions attempt to create a thunk for each element
- `f a` (function application) expressions attempt to create a thunk for the argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think it depends. The arguments are pointers on the stack (since recently up to 4, falling back to the heap, but that's actually a lot for currying), and those pointers can be acquired through maybeThunk ie eg ExprVar doesn't need to allocate. The return value may be written to the stack.
That's if the call needs to be made directly. Otherwise you might be looking at a tApp from a higher order function primop or something.

The allocation of an Env seems more certain in this situation. All it takes for that is that the function is not a primop.
Partially applied primops do allocate thunks though. Isn't this fun.

- `f a` (function application) expressions attempt to create a thunk for the argument
- `{ attr ? def }: ...`:
For every function evaluation where the function takes an attribute set where an attribute has a default value which doesn't exist in the passed argument,
a thunk for the default value is attempted to be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a thunk for the default value is attempted to be created.
a thunk for the default value is to be created.

What's this "attempting" about?
If it fails, it crashes and it seems to be an unlikely cause, or not really worth considering for the purpose of optimization.
I would think it's for the final value. More something representing an ExprSelect with the default. (Probably even a fake ExprSelect, but that's unnecessary detail.)

# let in expressions can allocate thunks
let

# 0 (+0) No thunk allocated because strings are atomic value expressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 0 (+0) No thunk allocated because strings are atomic value expressions
# 0 (+0) No thunk allocated because simple string literals in the parsed expression are accompanied by a reusable value which does not even start as a thunk.


# 1 (+1) Thunk is allocated, because the + operator is neither an atomic
# value nor a direct variable
greeting = "Hello, " + name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
greeting = "Hello, " + name;
greeting = "Hello, " + "world";

We don't have general constant expression elimination, and I don't think strings are an exception. (but strings are special, so maybe check)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants