A Plan for Making Rust Analyzer Faster #17491

davidbarsky · 2024-06-24T16:51:47Z

Some background context:

For a while, I've been looking at and investigating rust-analyzer performance issues.
@Wilfred, @darichey, and I all work together, and our employer is paying us to improve rust-analyzer's performance over the next 6 months on a full-time basis. We've determined that rust-analyzer's current performance is the biggest issue faced by Rust programmers at our employer.
- By listing these issues out, we're not attempting to "lick the cookie" or claim it—it's a lot of work for 3 people, and if anyone gets an issue before we do, please do it! Every rust-analyzer user would benefit with these things happening more quickly.

Anyways, here's non-exhaustive (but decently comprehensive!) list of things that we'd like to fix.

Non-Parallel, Single-Threaded Improvements

Rowan is slower than we'd like.
- A few weeks ago, I benchmarked tree construction in Rowan against syntree (a library with a similar API to rowan, but one that doesn’t implement structural sharing) and observed that tree construction is twice as fast as Rowan.
  - This isn’t the most fair comparison, as Rowan is optimized for editing trees—hence the structural sharing—but a syntax tree library structure with a more contiguous representation of the trees themselves would mean that the initial workspace loading/indexing could be substantially faster.
  - @CAD97’s work and analysis here is pertinent (Use sorbus internally as the green tree implementation rust-analyzer/rowan#58; Consider using succinct data structures for read-only trees rust-analyzer/rowan#107) and would likely be a building block for future work.
macro_rules! expansion is quadratic. This most visible with tt-munchers (discussion).
Switch from an LRU cache to a scan-resistant cache (LRU-K)
- This may or not be impactful, but rust-analyzer implements a limited form of garbage collection in the form of an LRU cache. rust-analyzer also does a lot of scan-like operations, which is a classic pathological case for a naive LRU cache. Replacing the LRU cache with LRU-k (specifically, a scan-resistant cache) might help with how often the syntax tree cache is rebuilt (discussion on Zulip).
Switch the crate graph to a Salsa database.
- Migrating the crate graph from an arena to a Salsa database would make the workspace reindexing that occurs after a user adds/removes a dependency to their project substantially faster and more incremental.
- Crate IDs are not currently stable, as they are derived from cargo-metadata or a rust-project.json. Deriving the crate ID from the crate data itself (via Salsa’s interning infrastructure) will provide a stable ID (discussion on Zulip).
- This is also necessary to properly support Cargo scripts.
During project loading, rust-analyzer repeatedly queries the project layout from rustc repeatedly, as there isn’t a clear end-state for project loading. Deferring this until the very end would likely help with loading large Rust workspaces.
- I did some rough, back-of-napkin benchmarks: rust-analyzer spends 5 seconds waiting on rustc for every 100 crates it loads (established by printing Instant::elapsed::as_millis). Given that some Rust projects can easily have of over 1000 dependencies, it’s possible that people are waiting for nearly a minute.
Moving to Salsa 3.0 might reduce with how many allocations/Arc increments regularly occur because Salsa 3.0 hands references instead of cloned values.
- I don’t know much of an impact this might have on the overall performance rust-analyzer, but if it does, it’d be because allocations/Arc increments/decrements would be smeared across the entire codebase.
- From a best practices perspective, we should move to Salsa 3.0 regardless, especially if it doesn't regress rust-analyzer's performance.

Parallel Improvements

From a sequencing standpoint, most parallelism improvements in rust-analyzer should happen after single-threaded improvements occurs, as introducing parallelism without making single-core performance improvements simply papers over existing, resolvable performance issues—I believe it is better to have those improvements compound. Besides, for a large number of these changes below, are waiting on Salsa 3.0, as it will enable trivial parallel computation for all databases (salsa-rs/salsa#495) ¹.

Parallelize autocomplete
- rust-analyzer spends a lot of time in Chalk doing trait solving. While performance: Speed up Method Completions By Taking Advantage of Orphan Rules #16555 helps reduce how many traits need to checked, trait solving should be relatively easy to parallelize (assuming an interface similar to the one outlined here. Achieving sub-100ms autocomplete latencies, even on larger projects, should be possible on multi-core machines.
Parallelize VFS loading
- At the moment, the VFS loads all sources serially (Load files into the VFS in parallel on startup #17373). rust-analyzer should attempt to saturate disk IO, as I've observed that the disk throughput is substantially less than what I'd expect from rust-analyzer on an SSD. This work does not depend on Salsa 3.0.

I'll add more items as they come up, but I figured I should combine the different discussion topics into a single location.

The version of Salsa used by rust-analyzer todays requires getting a snapshot from the ParallelDatabase trait, and ParallelDatabase is only implemented on the RootDatabase. Similarly, parallel query dependencies are not legible to Salsa, which complicates the invalidation story. ↩

The text was updated successfully, but these errors were encountered:

Wilfred · 2024-06-24T16:57:58Z

FWIW I see parallelism fixes as strictly orthogonal. Some are blocked on salsa, but parallel crate graph loading or parallel VFS loading could happen today, and I think they could be decent performance wins.

lnicola · 2024-06-24T18:38:23Z

Switch from an LRU cache to a scan-resistant cache (LRU-K)

Regarding this, I think the original Salsa used a scan-resistant strategy (SLRU?), see

rust-analyzer/crates/salsa/src/lru.rs

Line 228 in ac389ce

/// Promotes the node `node`, stored at `red_index` (in the red

, but it was dropped in favor of https://docs.rs/lru-cache/ in 2.0 / 2022 (and 3.0 too). I don't know how effective it was, though.

Veykril · 2024-06-25T07:37:51Z

During project loading, rust-analyzer repeatedly queries the project layout from rustc repeatedly, as there isn’t a clear end-state for project loading. Deferring this until the very end would likely help with loading large Rust workspaces.

Not sure I fully understand this, but if I do, then I think this is caused by the VFS being processed fairly slowly. Whenever a file gets created (or something along the lines, which is the case on start up for all files we load), we will requery the project layout. On fairly large projects the VFS just takes ages to report and be fully loaded.

Veykril · 2024-06-25T10:43:44Z

This isn’t the most fair comparison, as Rowan is optimized for editing trees—hence the structural sharing—but a syntax tree library structure with a more contiguous representation of the trees themselves would mean that the initial workspace loading/indexing could be substantially faster.

Given I'm of the opinion that the mutable API is more problematic than useful, we should definitely go for a flat structure I think. Edits / Construction of trees from multiple trees being more expensive shouldn't be a big problem I think, there is little that would care about that, assists are lazy in terms of their edits so there is no problem with that there, and aside from those not much really does tree style edits in the first place! So we should absolutely replace the current rowan -> #6584 (also to note, rowan makes heavy use of the header pattern which is unsound under SB/TB so swapping that would generally be good in that regard as well)

davidbarsky · 2024-06-25T15:22:59Z

Not sure I fully understand this, but if I do, then I think this is caused by the VFS being processed fairly slowly.

I think the VFS is partly responsible, but I've come to this conclusion by through a combination of debugging this and setting "RA_LOG": "rust_analyzer=info". When opening rust-analyzer, I saw 40 instances of INFO did fetch workspaces workspaces=[Ok(Json { n_crates: 230, n_sysroot_crates: 11, n_rustc_cfg: 36, toolchain: Some(Version { major: 1, minor: 78, patch: 0 }), data_layout: Ok("e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"), n_cfg_overrides: 2 })] followed by GlobalState::switch_workspaces switching workspaces.

I also did see a lot of GlobalState::handle_event{event=Event::Vfs}: task queue len: 1 (963, to be exact), but I'm not sure if that's evidence of the VFS being slow or rust-analyzer trying to be concurrent when, strictly speaking, it doesn't really need to be.

Whenever a file gets created (or something along the lines, which is the case on start up for all files we load), we will requery the project layout.

I have observed this, and while that certainly doesn't help, I think that fixing that issue would be pretty easy if it were possible for the VFS (at least during initial indexing) to read all sources at once. I wrote a small, incomplete file loader and ran it against buck2's rust-project.json. It "loaded" all 876 crates in 1.5 seconds on a cold file system and 1.14 seconds on a warm file system, whereas rust-analyzer normally takes over a minute for the same kind work. I think that one viable approach would be to add some sort of bulk loading functionality to the VFS that is able to create the graph once.

Veykril added C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) A-perf performance issues C-tracking-issue Category: tracking issue labels Jun 25, 2024

davidbarsky mentioned this issue Jul 3, 2024

Lazy Workspace/Project Discovery #17537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Plan for Making Rust Analyzer Faster #17491

A Plan for Making Rust Analyzer Faster #17491

davidbarsky commented Jun 24, 2024 •

edited

Loading

Wilfred commented Jun 24, 2024

lnicola commented Jun 24, 2024 •

edited

Loading

Veykril commented Jun 25, 2024

Veykril commented Jun 25, 2024 •

edited

Loading

davidbarsky commented Jun 25, 2024

A Plan for Making Rust Analyzer Faster #17491

A Plan for Making Rust Analyzer Faster #17491

Comments

davidbarsky commented Jun 24, 2024 • edited Loading

Non-Parallel, Single-Threaded Improvements

Parallel Improvements

Footnotes

Wilfred commented Jun 24, 2024

lnicola commented Jun 24, 2024 • edited Loading

Veykril commented Jun 25, 2024

Veykril commented Jun 25, 2024 • edited Loading

davidbarsky commented Jun 25, 2024

davidbarsky commented Jun 24, 2024 •

edited

Loading

lnicola commented Jun 24, 2024 •

edited

Loading

Veykril commented Jun 25, 2024 •

edited

Loading