Subninja support and multithreaded loading #112

Colecf · 2024-02-28T02:22:13Z

Fixes #109
Fixes #108

This PR ports over most of the enhancements that are in Android's fork of ninja to n2. N2 goes from taking ~16 seconds to parse the AOSP ninja files, to ~1 second. (at least on my work computer where available_parallelism returns 128) It still seems to be about ~0.1 seconds slower than the android fork of ninja, but it's at least in the range where it's a little difficult to measure.

A general list of the changes in this PR:

Add support for subninja, with scoped variables.
mmap input files instead of reading them into memory.
Remove the FileIds and the Vec of files, replacing them with Arc. Having to maintain a separate vec mapping FileIds to files is too expensive.
Use the DashMap crate to map from filename to File object, so that id_from_canonical can be called in parallel.
Use rayon for multithreading.
Split input files into chunks and parse the chunks in parallel
Parse subninjas in parallel. Subninja support was implemented because regular includes cannot be processed in parallel.
Revamp the evaluation of variable assignments. Now, every statement in the ninja file gets a "scope position". When evaluating global variables, they are evaluated relative to a certain scope position, so you can evaluate the variable at a particular point in the file, even if it is reassigned later.
Introduce "clumps" of parse results, as returning all the parsed statements in a flat vector causes too many large vector concatenations.
Box Build objects, as they are quite large, and Vecs of unboxed builds are too large and slow to concatenate.
Don't evaluate all the build bindings at load time. Only evaluate the input/output files, and defer evaluation of other bindings until the build is actually run.
Evalstrings are now just regular strings instead of Vecs of strings. This means when they're evaluated they're re-parsed. Doing it this way saves of the Vec allocations, and in the case of owned evalstrings, does one big string allocation instead of a bunch of little ones. There are a lot more owned evalstrings now due to deferred evaluation of build bindings, so it makes a bit of a difference.

One thing that the android fork has that I didn't include was precomputed hashes. The android fork has a HashedString and HashedStrView type that it uses to ensure hashes aren't recalculated. This is somewhat difficult to do in rust, you have to either make concessions about being able to look up map values by their borrowed representation, use the raw entry map apis on nightly rust, or use hashbrown's RawTable. And even when I was experimenting with RawTable it didn't seem to provide a noticeable speedup.

Some of these optimizations are memory optimizations. This turned out to be necessary because apparently the exit_group syscall that quits the process spends time cleaning up memory. The exit_group syscall was taking over a second to run before some of these optimizations.

I've also bumped the rust version needed to compile n2 from 1.70 to 1.73, to be able to use usize::next_multiple_of. But if we want to keep the rust version low I can just manually write a next_multiple_of.

Colecf · 2024-02-28T19:08:45Z

CI is passing now.

evmar · 2024-02-28T22:50:32Z

Wow, impressive! I was just wondering this weekend whether you were still tinkering in this area...

Some minor thoughts I had while skimming this:

I wonder if an arena would help with the all the alloc/dealloc traffic, in particular parsing creates a zillion Builds and we never want to free any except all at once
re scope positions, I don't understand what you've done yet but in ninja we had each scope contain a pointer to its immutable parent scope, so in a series like a = ...; { b = $a; } a = ... the nested bit in the middle would see only the first value of a
have you looked at the perf of how quickly it starts on build tasks? I imagine your 10% is just parse-time perf, and I'm curious whether n2's hash-based dirty checking impacts you

Colecf · 2024-02-29T00:42:58Z

I did briefly look into bumpalo, but didn't use it because dashmap and stdlib hashmaps don't support custom allocators. Also bumpalo doesn't support multithreading directly, you have to create one arena per thread (bumpalo_herd). However the hashbrown crate supports them, so maybe that's worth trying... I did use std::mem::forget on the graph at the end of the program so we don't spend time deallocating it.
Yes, I think that's essentially the same thing. Though now with the multithreading every variable assignment, rule, build, default, and subninja gets a scope position so all these things can be evaluated in parallel, not just subninjas.
I haven't profiled build task start time, that might be worth doing but it should be much smaller than the parse time. I'm not sure what 10% you're referring to though.

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in #94? You should have a shared google drive zip.

evmar · 2024-03-05T19:07:06Z

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in #94? You should have a shared google drive zip.

Thanks! I grabbed a copy onto my Google Drive.

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

This removes the need for maintaining 2 maps from ids to files, and strings to ids.

These turn out to be faster than spawning individual tasks, even if it requires more single-threaded parts of the code.

Using Arc<String> allows us to not copy the string into both the hashmap key and the File object.

Keep the statements around as a Vec<Vec<Statement>> even after parsing, because copying all the builds into a new vector takes too long and causes more memory usage.

Another attempt to reduce copies / memory usage

Using the final graph::Build object is faster due to the need for fewer copies between different vector types. It will also allow us to only evaluate the bindings of the Build objects we actually use.

This saves ~0.3 seconds of merging the vecs of builds.

This allows us to not spend time evaluating all the build bindings, but we still need to evaluate all the global variables if we want to not keep the file in memory.

We only need these within the lifetime of 'text, so we can make them references instead.

Surprisingly this seems to have a measurable ~0.1s performance increase. Also removed some unwraps in the VariableAssignment's evaluation, but those didn't seem to help as much.

unmapping files is slow.

So that we don't have to allocate memory for their parsed representations.

This should also improve performance of evaluation in general slightly.

Windows doesn't have support for mmap (at least not with the same apis)

Needed for usize::next_multiple_of().

Colecf · 2024-03-05T19:47:09Z

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

Do you mean this PR or n2 in general? For the PR, yes, we definitely need some kind of load speed improvement, 16 seconds to start ninja would cause riots.

For n2 in general, I've started the process of forking it into android, but there are still a number of features that need to be added. I'm hoping to switch to it soon because there are some other teams who either want to add more ninja features, or who have already made their own forks with new features and are now contemplating merging their code back into the android fork. (e.g. abfs, a filesystem that communicates the files read by actions back to the build system) I also want to implement action sandboxing. I think it would be easier to scale to all these new features if they were written in rust as opposed to C++.

evmar · 2024-03-05T20:05:59Z

I am glad it is working out! I agree that working from Rust will hopefully make it easier to extend. One of my goals with n2 was to make it separable enough that it's hopefully easier to reason about.

BTW I think you'll need #50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs. (In particular I have some ideas on how n2 might implement content hashing for files, which is useful both for dirty checking but also for interacting with cloud build systems, and how that can nicely fit into n2's model.)

Colecf · 2024-03-05T20:12:24Z

BTW I think you'll need #50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

Yeah I'm aware of that issue. Though what "tricky hack" are you talking about?

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs.

Thanks! Though I think I have a good idea about what needs to be done. What would be most helpful would be if you could review changes like this one faster, but I understand that's a lot of work to ask of you. So I'm going to try to get a fork going so we can get the fork up to speed faster.

In particular I have some ideas on how n2 might implement content hashing for files

Yeah I'd like to add that feature as well.

evmar · 2024-03-06T00:27:57Z

Though what "tricky hack" are you talking about?

The custom "database" defined in db.rs. I think Ninja has some extra goop in it for the databasey corner cases, where like each row is suffixed with a parity check to catch partial writes to it.

dashmap is currently unused, but will be required after merging in evmar/n2#112 Bug: 318434287 Bug: 328273370 Test: added n2 to build-prebuilts.sh, and ran OUT_DIR=out prebuilts/build-tools/build-prebuilts.sh Change-Id: Ied5af144b8b2c21b385521be430f9d0313c00d93

Colecf force-pushed the multithreaded_loading branch 3 times, most recently from aa7d8f3 to 60055f0 Compare February 28, 2024 18:24

Colecf mentioned this pull request Feb 28, 2024

10% of parse time is tracking line numbers #107

Open

Colecf and others added 22 commits March 5, 2024 11:31

Multithreading WIP

f6670aa

Multithreaded loading

c690522

Remove FileIds and Use Arc<File> instead

dd80981

This removes the need for maintaining 2 maps from ids to files, and strings to ids.

Switch to parallel iterators

248d161

These turn out to be faster than spawning individual tasks, even if it requires more single-threaded parts of the code.

Use Arc<String> for filenames

a0600fd

Using Arc<String> allows us to not copy the string into both the hashmap key and the File object.

Don't flatten build vector

0748d21

Keep the statements around as a Vec<Vec<Statement>> even after parsing, because copying all the builds into a new vector takes too long and causes more memory usage.

Parse by chunks

3ebeec7

Another attempt to reduce copies / memory usage

Remove parse::Build, using graph::Build instead

740f593

Using the final graph::Build object is faster due to the need for fewer copies between different vector types. It will also allow us to only evaluate the bindings of the Build objects we actually use.

Standardize on clumps

127cb25

Reserve some vector sizes and add traces

fbfe7ee

Fix bug with evaluating variables from parent scope

1e994ee

Box Builds

4889576

This saves ~0.3 seconds of merging the vecs of builds.

Switch to FxHashMap and deallocate scopes in parallel

b1213e6

Defer build binding evaluation

ec38455

This allows us to not spend time evaluating all the build bindings, but we still need to evaluate all the global variables if we want to not keep the file in memory.

Don't own the unevaluated build ins/outs

d10ab6d

We only need these within the lifetime of 'text, so we can make them references instead.

Remove executor argument

4389ee7

Put unevaluated build ins/outs in 1 vec

49e371a

Make the files map use FxHasher

983f498

Use unwrap_err_unchecked in Scope::evaluate

9212a9b

Surprisingly this seems to have a measurable ~0.1s performance increase. Also removed some unwraps in the VariableAssignment's evaluation, but those didn't seem to help as much.

Unmap files in parallel

5fdd13c

unmapping files is slow.

Call format_parse_error again

86dacb3

Fix bug in evaluation order

d9d54f9

Colecf added 7 commits March 5, 2024 11:31

Add a test for subninjas

01c246d

Double parse evalstrings

a99843a

So that we don't have to allocate memory for their parsed representations.

owned evalstring parsing

1718839

Fix bug where build paths were evaluated an extra time

04ec995

This should also improve performance of evaluation in general slightly.

Cleanup and prepare for submitting

0d9bf01

Fall back to reading instead of mmap on windows

98867a2

Windows doesn't have support for mmap (at least not with the same apis)

Update rust to 1.73.0

9cc13a8

Needed for usize::next_multiple_of().

Colecf force-pushed the multithreaded_loading branch from 60055f0 to 9cc13a8 Compare March 5, 2024 19:37

Colecf mentioned this pull request May 7, 2024

Shell-escape $in, $in_newline, $out, and $out_newline #117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subninja support and multithreaded loading #112

Subninja support and multithreaded loading #112

Colecf commented Feb 28, 2024 •

edited

Loading

Colecf commented Feb 28, 2024

evmar commented Feb 28, 2024

Colecf commented Feb 29, 2024 •

edited

Loading

evmar commented Mar 5, 2024

Colecf commented Mar 5, 2024

evmar commented Mar 5, 2024 •

edited

Loading

Colecf commented Mar 5, 2024

evmar commented Mar 6, 2024

Subninja support and multithreaded loading #112

Are you sure you want to change the base?

Subninja support and multithreaded loading #112

Conversation

Colecf commented Feb 28, 2024 • edited Loading

Colecf commented Feb 28, 2024

evmar commented Feb 28, 2024

Colecf commented Feb 29, 2024 • edited Loading

evmar commented Mar 5, 2024

Colecf commented Mar 5, 2024

evmar commented Mar 5, 2024 • edited Loading

Colecf commented Mar 5, 2024

evmar commented Mar 6, 2024

Colecf commented Feb 28, 2024 •

edited

Loading

Colecf commented Feb 29, 2024 •

edited

Loading

evmar commented Mar 5, 2024 •

edited

Loading