-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards Handling Big LHS #361
base: main
Are you sure you want to change the base?
Conversation
76e2ab9
to
15b974e
Compare
Solving the conflicts. |
… and enable the ctpop test case
15b974e
to
5e80c57
Compare
findCands(LHS, Inputs, /*WidthMustMatch=*/false, /*FilterVars=*/false, MaxLHSCands); | ||
std::vector<Inst *> Vars; | ||
findVars(LHS, Vars); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should talk about this part. why do you think it's a good idea? do you have data supporting that it does a better job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because findCands() stops when it reaches MaxLHSCands(which is set to 15 by default). However, this might miss the input variable declared at the top of program. Due to this limitation, the synthesizer cannot gives correct ctpop result since the input %x is not harvested as a synthesis component.
Exhaustive synthesis has a call findVars() after doing the big query, hoisting up this call does not have much impact to the compilation performance, compared with the potential benefit of getting better synthesis results.
This approach is stupid I admit, it basically runs the traversal on the LHS twice. I can write a better version of the findCands() which runs once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code basically guarantees all the program inputs are collected as components in the synthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that tuning MaxLHSCands will be useful, but just harvesting the vars doesn't sound like the right solution.
for one thing, what if there are 500 vars? then synthesis of just one select instruction is going to involve evaluating 500x500x500 guesses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea behind the hard bound is that it prevents these degenerate cases, but your patch will just open us up to them again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this makes sense. But instead of stopping at a hard bounded limit when doing findCands(), can we traverse the whole tree and hard bound the result by the descending order by the "benefit" of each candidates? There is already a "benefit" tagging algorithm there implemented in findCands().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current "benefit" tagging algorithm simply follows: the deeper the traverse goes, the higher benefit the candidate gets. We may further customize this benefit function to fit our needs. Say some algorithm can tell some kind of candidates are more likely to be used as a component in synthesis, then we may increase the "benefit" of this kind of candidates, to reduce the synthesis time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the benefit is a property of the RHS cost vs. LHS cost, not a property of the choice of inputs, which is what we're talking about here.
my observation is that most rewrites are going to occur at the bottom of a souper LHS, which is why I wrote the code to get inputs using DFS on the LHS.
if we pick the ones likely to lead to the highest benefit, you're saying to pick inputs from the top of the LHS. I'm not opposed to this, but I think it is a mistake to make the change before we have data showing that the change is a good one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway I'm saying:
- this is really two patches, please split them
- for each patch, I want you to justify it using data instead of making guesses about what will work best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I will split this pr and collect data
@@ -32,6 +32,9 @@ namespace { | |||
"The larger the number is, the more fine-grained debug " | |||
"information will be printed"), | |||
cl::init(0)); | |||
static cl::opt<bool> NoBigQuery("souper-exhaustive-synthesis-no-big-query", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's get some data about this also -- maybe run synthesis with and without it and see how it changes the CPU time used
I think you should separate the bigquery part of this PR from the changes to how LHSs are chosen |
for bigquery, we should choose the default to be the one that runs faster-- but to make that decision we need data |
Always feed getGuesses() with Vars in LHS: the original findCands() for generating synthesis candidates are called with MaxCand = 15, this may miss the input variables declared at the beginning of a large program. This patch fixes this.
Add a switch for big query: In some cases the huge big query may abort the Souper
Enable the ctpop test case.