Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST: More efficient way to collect referenced source units #15579

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

blishko
Copy link
Contributor

@blishko blishko commented Nov 21, 2024

Previous implementation of the method SourceUnit::referencedSourceUnits contained a subtle performance bug. Because the skip list was passed by value into the recursive call, the dependency graph of the imports were effectively traversed as if expanded into a full tree, instead of as a DAG (directed acyclic graph).

An example to illustrate that previously the same source was visited more than once: Suppose A.sol imports B.sol and C.sol and both of these import D.sol. Previosuly, the method would process A by first recursing into B and then C. When processing B, the source D is processed and then added to the skip list. When the recursion returns from processing B, any changes made to the skip list there were discarded, so that during processing C, the source D is not find in the skip list and processed again.

Now, in most cases the import/dependency graph is probably shallow or does not contain such diamond-like subgraphs, and the performance is not affected.
However, for a deeper dependency graph with multiple layers of diamond-like subgraphs this quickly leads to very bad performance, because every source unit is visited a number of times equal to the number of paths by which the source unit is reachable from the root source unit.

This change seems to shave off tens of seconds on both legacy and ir pipeline for sablier-v2-1.2.0 project.

@blishko
Copy link
Contributor Author

blishko commented Nov 21, 2024

Here are the results I measured on my laptop.

Before:

File Pipeline Time Memory (peak) Exit code
uniswap-v4-2022-06-16 legacy 0 s 108 MiB 1
uniswap-v4-2022-06-16 ir 0 s 106 MiB 1
openzeppelin-5.0.2 legacy 4 s 457 MiB 0
openzeppelin-5.0.2 ir 13 s 525 MiB 0
openzeppelin-4.9.0 legacy 5 s 550 MiB 0
openzeppelin-4.9.0 ir 16 s 616 MiB 0
liquity-2024-10-30 legacy 5 s 536 MiB 0
liquity-2024-10-30 ir 17 s 747 MiB 0
openzeppelin-4.7.0 legacy 8 s 573 MiB 0
openzeppelin-4.7.0 ir 23 s 597 MiB 0
openzeppelin-4.8.0 legacy 10 s 715 MiB 0
openzeppelin-4.8.0 ir 26 s 724 MiB 0
uniswap-v4-2024-06-06 legacy 49 s 1649 MiB 0
uniswap-v4-2024-06-06 ir 49 s 1825 MiB 0
eigenlayer-0.3.0 legacy 36 s 2419 MiB 0
eigenlayer-0.3.0 ir 205 s 5218 MiB 0
sablier-v2-1.2.0 legacy 120 s 5582 MiB 0
sablier-v2-1.2.0 ir 580 s 6834 MiB 0
seaport-1.6 legacy 75 s 3677 MiB 0
seaport-1.6 ir 6 s 916 MiB 1
farcaster-3.1.0 legacy 11 s 866 MiB 0
farcaster-3.1.0 ir 48 s 1673 MiB 1

After:

File Pipeline Time Memory (peak) Exit code
uniswap-v4-2022-06-16 legacy 0 s 108 MiB 1
uniswap-v4-2022-06-16 ir 0 s 106 MiB 1
openzeppelin-5.0.2 legacy 5 s 447 MiB 0
openzeppelin-5.0.2 ir 14 s 511 MiB 0
openzeppelin-4.9.0 legacy 6 s 525 MiB 0
openzeppelin-4.9.0 ir 17 s 597 MiB 0
liquity-2024-10-30 legacy 5 s 518 MiB 0
liquity-2024-10-30 ir 18 s 758 MiB 0
openzeppelin-4.7.0 legacy 9 s 563 MiB 0
openzeppelin-4.7.0 ir 24 s 603 MiB 0
openzeppelin-4.8.0 legacy 10 s 711 MiB 0
openzeppelin-4.8.0 ir 27 s 731 MiB 0
uniswap-v4-2024-06-06 legacy 49 s 1853 MiB 0
uniswap-v4-2024-06-06 ir 50 s 1695 MiB 0
eigenlayer-0.3.0 legacy 36 s 2443 MiB 0
eigenlayer-0.3.0 ir 207 s 4487 MiB 0
sablier-v2-1.2.0 legacy 69 s 5071 MiB 0
sablier-v2-1.2.0 ir 543 s 6608 MiB 0
seaport-1.6 legacy 74 s 3536 MiB 0
seaport-1.6 ir 4 s 962 MiB 1
farcaster-3.1.0 legacy 11 s 860 MiB 0
farcaster-3.1.0 ir 48 s 1648 MiB 1

There is a big difference for sablier-v2-1.2.0 on both legacy and ir. Other projects do not seem to be affected.

cameel
cameel previously approved these changes Nov 21, 2024
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving since the change looks reasonable, even if it does not improve performance significantly (I'm still running benchmarks to see if the sablier case is a fluke or not) but I have some comments.

libsolidity/ast/AST.cpp Outdated Show resolved Hide resolved
if (_recurse)
sourceUnits += sourceUnit->referencedSourceUnits(true, _skipList);
sourceUnit->referencedSourceUnits(_referencedSourceUnits, true, _skipList);
Copy link
Member

@cameel cameel Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that passing _skipList by reference technically changes behavior here - the items added in recursive calls now affect the next loop cycle, previously they were discarded.

Not sure if this was intentional, but fortunately looks like it works to our advantage here, because _skipList can only grow and we avoid revisiting the same unit multiple times that way. Could have caused issues if that wasn't the cause though.

EDIT: Just noticed the PR description and it looks like it was fully intentional :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with that there's still some redundancy here: we run this function for every contract and each time the function walks the same import chains again:

for (auto const sourceUnit: _contract.contract->sourceUnit().referencedSourceUnits(true))
referencedSources.insert(*sourceUnit->annotation().path);

We could avoid that by storing the list of referenced imports in SourceUnit. Not sure if it's worth it though. Looking at benchmarks, probably not.

@cameel
Copy link
Member

cameel commented Nov 21, 2024

File Pipeline Time Memory (peak) Exit code
uniswap-v4-2022-06-16 legacy 0 s 108 MiB 1
uniswap-v4-2022-06-16 ir 0 s 106 MiB 1

What is the error you're getting there (you can see the content of stderr when the script finishes)? For me these two pass without errors.

@blishko
Copy link
Contributor Author

blishko commented Nov 21, 2024

What is the error you're getting there (you can see the content of stderr when the script finishes)? For me these two pass without errors.

==================================
stderr for uniswap-v4-2022-06-16 via legacy
==================================
Warning: Found unknown config section in foundry.toml: [ci]
This notation for profiles has been deprecated and may result in the profile not being registered in future versions.
Please use [profile.ci] instead or run `forge config --fix`.
Warning: Found unknown config section in foundry.toml: [default]
This notation for profiles has been deprecated and may result in the profile not being registered in future versions.
Please use [profile.default] instead or run `forge config --fix`.
Error: Compiler run failed:
Warning (3805): This is a pre-release compiler version, please do not use it in production.

Error (7792): Function has override specified but does not override anything.
  --> contracts/test/MockContract.sol:35:41:
   |
35 |     function _beforeFallback() internal override {
   |                                         ^^^^^^^^

Warning (3628): This contract has a payable fallback function, but no receive ether function. Consider adding a receive ether function.
 --> contracts/test/MockContract.sol:8:1:
  |
8 | contract MockContract is Proxy {
  | ^ (Relevant source part starts here and spans across multiple lines).
Note: The payable fallback function is defined here.
  --> node_modules/@openzeppelin/contracts/proxy/Proxy.sol:66:5:
   |
66 |     fallback() external payable virtual {
   |     ^ (Relevant source part starts here and spans across multiple lines).

Previous implementation of the method SourceUnit::referencedSourceUnits
contained a subtle performance bug. Because the skip list was passed by
value into the recursive call, the dependency graph of the imports were
effectively traversed as if expanded into a full tree, instead of as a
DAG (directed acyclic graph).

An example to illustrate that previously the same source was visited
more than once: Suppose `A.sol` imports `B.sol` and `C.sol` and both of these
import `D.sol`. Previosuly, the method would process `A` by first
recursing into `B` and then `C`. When processing `B`, the source `D` is
processed and then added to the skip list. When the recursion returns
from processing `B`, any changes made to the skip list there were
discarded, so that during processing `C`, the source `D` is not find in
the skip list and processed again.

Now, in most cases the import/dependency graph is probably shallow or does
not contain such diamond-like subgraphs, and the performance is not
affected.
However, for a deeper dependency graph with multiple layers of
diamond-like subgraphs this quickly leads to very bad performance,
because every source unit is visited a number of times equal to the
number of paths by which the source unit is reachable from the root
source unit.

This change seems to shave off *tens* of seconds on **both** legacy and
ir pipeline for `sablier-v2-1.2.0` project.
@cameel
Copy link
Member

cameel commented Nov 21, 2024

Error (7792): Function has override specified but does not override anything.
  --> contracts/test/MockContract.sol:35:41:
Warning (3628): This contract has a payable fallback function, but no receive ether function. > Consider adding a receive ether function.
 --> contracts/test/MockContract.sol:8:1:

MockContract.sol imports @openzeppelin/contracts/proxy/Proxy.sol, which for this older Uniswap version still had to be installed via npm. The errors look as if it either does not get installed or a wrong version gets installed. Check what you have in uniswap-v4-2022-06-16/node_modules/@openzeppelin/contracts/proxy/Proxy.sol. Maybe external-setup.sh needs an adjustment. Or maybe you do not have npm?

Though in the end this particular project is not that relevant here (it was mostly useful for benchmarking across a wide range of solc versions) so it's fine to also ignore it if you don't want to dig into it.

@cameel
Copy link
Member

cameel commented Nov 21, 2024

And here are my benchmarks:

File Pipeline Time (before) Time (after) Memory (before) Memory (after)
uniswap-v4-2022-06-16 legacy 5 s 5 s 245 MiB 244 MiB
uniswap-v4-2022-06-16 ir 19 s 20 s 298 MiB 300 MiB
openzeppelin-5.0.2 legacy 10 s 10 s 450 MiB 450 MiB
openzeppelin-5.0.2 ir 34 s 30 s 492 MiB 492 MiB
openzeppelin-4.9.0 legacy 12 s 13 s 516 MiB 516 MiB
openzeppelin-4.9.0 ir 39 s 37 s 575 MiB 574 MiB
liquity-2024-10-30 legacy 10 s 10 s 506 MiB 504 MiB
liquity-2024-10-30 ir 41 s 39 s 640 MiB 641 MiB
openzeppelin-4.7.0 legacy 19 s 18 s 605 MiB 606 MiB
openzeppelin-4.7.0 ir 52 s 51 s 578 MiB 579 MiB
openzeppelin-4.8.0 legacy 21 s 21 s 759 MiB 761 MiB
openzeppelin-4.8.0 ir 61 s 59 s 729 MiB 729 MiB
uniswap-v4-2024-06-06 legacy 21 s 21 s 896 MiB 898 MiB
uniswap-v4-2024-06-06 ir 111 s 109 s 1492 MiB 1484 MiB
eigenlayer-0.3.0 legacy 79 s 72 s 2402 MiB 2405 MiB
eigenlayer-0.3.0 ir 502 s 485 s 4461 MiB 4457 MiB
sablier-v2-1.2.0 legacy 196 s 150 s 6345 MiB 6350 MiB
sablier-v2-1.2.0 ir 1267 s 1258 s 14513 MiB 14546 MiB

The relative difference for sablier is much smaller but it's still there, so there must be something to it.

Also, the huge discrepancy in memory usage is interesting. Apparently sablier uses 15 GB on my machine, which I'd normally ascribe to the recent bump to 1.2 but for you it's much smaller - I wonder how that's possible? And the difference for uniswap-v4-2024-06-06 is also quite big.

@blishko blishko merged commit 72795fb into develop Nov 21, 2024
73 checks passed
@blishko blishko deleted the optimize-referenced-sources branch November 21, 2024 16:45
@blishko
Copy link
Contributor Author

blishko commented Nov 21, 2024

Error (7792): Function has override specified but does not override anything.
  --> contracts/test/MockContract.sol:35:41:
Warning (3628): This contract has a payable fallback function, but no receive ether function. > Consider adding a receive ether function.
 --> contracts/test/MockContract.sol:8:1:

MockContract.sol imports @openzeppelin/contracts/proxy/Proxy.sol, which for this older Uniswap version still had to be installed via npm. The errors look as if it either does not get installed or a wrong version gets installed. Check what you have in uniswap-v4-2022-06-16/node_modules/@openzeppelin/contracts/proxy/Proxy.sol. Maybe external-setup.sh needs an adjustment. Or maybe you do not have npm?

Though in the end this particular project is not that relevant here (it was mostly useful for benchmarking across a wide range of solc versions) so it's fine to also ignore it if you don't want to dig into it.

I will check if I can fix this on my end. Thanks for the pointers!

@blishko
Copy link
Contributor Author

blishko commented Nov 21, 2024

The relative difference for sablier is much smaller but it's still there, so there must be something to it.

Also, the huge discrepancy in memory usage is interesting. Apparently sablier uses 15 GB on my machine, which I'd normally ascribe to the recent bump to 1.2 but for you it's much smaller - I wonder how that's possible? And the difference for uniswap-v4-2024-06-06 is also quite big.

The memory usage for uniswap in my data is irrelevant, because it ends prematurely with an error.
For sablier, I would say the memory usage reported by the script is inaccurate. When I was monitoring the process live, it was definitely using up to 12GB. Maybe this has something to do how the gnu time command behaves on MacOS.

Regarding the times, indeed for you the difference in sabier via ir is much smaller, but it is visible, and for legacy it is comparable to mine. Also, for you, eigenlayer also shows some improvement, so that's good.

@cameel
Copy link
Member

cameel commented Nov 21, 2024

The memory usage for uniswap in my data is irrelevant, because it ends prematurely with an error.

The 2024 one does not.

@blishko
Copy link
Contributor Author

blishko commented Nov 21, 2024

You are right, I missed that! I don't have an answer for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants