-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sigspec: decrease size with tagged union of bits and chunks #4490
base: main
Are you sure you want to change the base?
Conversation
void RTLIL::SigSpec::pack() const | ||
{ | ||
RTLIL::SigSpec *that = (RTLIL::SigSpec*)this; | ||
|
||
if (that->bits_.empty()) | ||
if (packed_ || that->bits_.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be removing the bits_.empty()
check? We need to guarantee accesses to chunks_
are valid once pack()
returns, so we need to do the switch in full no matter if bits_
are empty.
check(); | ||
} | ||
|
||
void RTLIL::SigSpec::unpack() const | ||
{ | ||
RTLIL::SigSpec *that = (RTLIL::SigSpec*)this; | ||
|
||
if (that->chunks_.empty()) | ||
if (!packed_ || that->chunks_.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Analogously here
std::vector<RTLIL::SigBit> bits_; // LSB at index 0 | ||
union { | ||
std::vector<RTLIL::SigChunk> chunks_; // LSB at index 0 | ||
std::vector<RTLIL::SigBit> bits_; // LSB at index 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this with std::variant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, this is a good small but hot case to try it on
kernel/rtlil.cc
Outdated
width_ = 0; | ||
hash_ = 0; | ||
append(SigBit(bit)); | ||
check(); | ||
} | ||
|
||
void RTLIL::SigSpec::switch_to_packed() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the naming alone it would appear this does the same thing as pack()
, I suggest we name it switch_union_to_chunks
or similar.
Turns out this gives a 5-10% performance penalty across the board, even with thin LTO. Profiling should show where this is coming from. Could be constructor/destructor or swap calls on pack/unpack. Time to read what STL methods compile down to I guess |
We can add a 24 byte padding to the struct to check if the penalty is due to cache alignment effects. |
I tried that earlier - adding |
@widlarizer Disregarding any switch to a union, please see what kind of a penalty we get from switching to
(Note the swap in place of a clear on That should inform what we do next with this PR. |
@widlarizer reports that the |
What are the reasons/motivation for this change?
small is fast
Explain how this is achieved.
This PR reduces
sizeof(SigSpec)
from 64 bytes to 40 bytes. SigSpec is always strictly either in packed mode (bits_
are empty) or unpacked (chunks_
are empty). A vector is three pointers. This PR putsbits_
andchunks_
into a union tagged withbool packed_
. Union UB outside of rtlil.cc and rtlil.h is avoided since already the current implementation of SigSpec hasbits_
andchunks_
as private members. This brings lower runtime memory usage and improve performance with improved cache localityIf applicable, please suggest to reviewers how they can test the change.
Manually verify that all direct
chunks_
accesses happen only in branches wherepacked_
is true, andbits_
where false. Run benchmarks, build with sanitizer, etc