-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AST: More efficient way to collect referenced source units #15579
Conversation
Here are the results I measured on my laptop. Before:
After:
There is a big difference for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving since the change looks reasonable, even if it does not improve performance significantly (I'm still running benchmarks to see if the sablier case is a fluke or not) but I have some comments.
if (_recurse) | ||
sourceUnits += sourceUnit->referencedSourceUnits(true, _skipList); | ||
sourceUnit->referencedSourceUnits(_referencedSourceUnits, true, _skipList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that passing _skipList
by reference technically changes behavior here - the items added in recursive calls now affect the next loop cycle, previously they were discarded.
Not sure if this was intentional, but fortunately looks like it works to our advantage here, because _skipList
can only grow and we avoid revisiting the same unit multiple times that way. Could have caused issues if that wasn't the cause though.
EDIT: Just noticed the PR description and it looks like it was fully intentional :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with that there's still some redundancy here: we run this function for every contract and each time the function walks the same import chains again:
solidity/libsolidity/interface/CompilerStack.cpp
Lines 1675 to 1676 in b7b10c8
for (auto const sourceUnit: _contract.contract->sourceUnit().referencedSourceUnits(true)) | |
referencedSources.insert(*sourceUnit->annotation().path); |
We could avoid that by storing the list of referenced imports in SourceUnit
. Not sure if it's worth it though. Looking at benchmarks, probably not.
What is the error you're getting there (you can see the content of stderr when the script finishes)? For me these two pass without errors. |
|
Previous implementation of the method SourceUnit::referencedSourceUnits contained a subtle performance bug. Because the skip list was passed by value into the recursive call, the dependency graph of the imports were effectively traversed as if expanded into a full tree, instead of as a DAG (directed acyclic graph). An example to illustrate that previously the same source was visited more than once: Suppose `A.sol` imports `B.sol` and `C.sol` and both of these import `D.sol`. Previosuly, the method would process `A` by first recursing into `B` and then `C`. When processing `B`, the source `D` is processed and then added to the skip list. When the recursion returns from processing `B`, any changes made to the skip list there were discarded, so that during processing `C`, the source `D` is not find in the skip list and processed again. Now, in most cases the import/dependency graph is probably shallow or does not contain such diamond-like subgraphs, and the performance is not affected. However, for a deeper dependency graph with multiple layers of diamond-like subgraphs this quickly leads to very bad performance, because every source unit is visited a number of times equal to the number of paths by which the source unit is reachable from the root source unit. This change seems to shave off *tens* of seconds on **both** legacy and ir pipeline for `sablier-v2-1.2.0` project.
ed9b53f
to
48d40d5
Compare
Though in the end this particular project is not that relevant here (it was mostly useful for benchmarking across a wide range of solc versions) so it's fine to also ignore it if you don't want to dig into it. |
And here are my benchmarks:
The relative difference for sablier is much smaller but it's still there, so there must be something to it. Also, the huge discrepancy in memory usage is interesting. Apparently sablier uses 15 GB on my machine, which I'd normally ascribe to the recent bump to 1.2 but for you it's much smaller - I wonder how that's possible? And the difference for |
I will check if I can fix this on my end. Thanks for the pointers! |
The memory usage for uniswap in my data is irrelevant, because it ends prematurely with an error. Regarding the times, indeed for you the difference in |
The 2024 one does not. |
You are right, I missed that! I don't have an answer for that. |
Previous implementation of the method SourceUnit::referencedSourceUnits contained a subtle performance bug. Because the skip list was passed by value into the recursive call, the dependency graph of the imports were effectively traversed as if expanded into a full tree, instead of as a DAG (directed acyclic graph).
An example to illustrate that previously the same source was visited more than once: Suppose
A.sol
importsB.sol
andC.sol
and both of these importD.sol
. Previosuly, the method would processA
by first recursing intoB
and thenC
. When processingB
, the sourceD
is processed and then added to the skip list. When the recursion returns from processingB
, any changes made to the skip list there were discarded, so that during processingC
, the sourceD
is not find in the skip list and processed again.Now, in most cases the import/dependency graph is probably shallow or does not contain such diamond-like subgraphs, and the performance is not affected.
However, for a deeper dependency graph with multiple layers of diamond-like subgraphs this quickly leads to very bad performance, because every source unit is visited a number of times equal to the number of paths by which the source unit is reachable from the root source unit.
This change seems to shave off tens of seconds on both legacy and ir pipeline for
sablier-v2-1.2.0
project.