Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how match flattening affects us #113

Open
goldfirere opened this issue Apr 9, 2015 · 6 comments
Open

Figure out how match flattening affects us #113

goldfirere opened this issue Apr 9, 2015 · 6 comments

Comments

@goldfirere
Copy link
Owner

In the master branch, all pattern-matches are "flattened" before further processing. This is done from scLetDec in squashWildcards, called in singTopLevelDecs. This is a drastic change, meaning that no patterns are overlapped anymore. How is this change affecting us? Is the code bloat truly terrible? How are compilation times? Pattern-match flattening allows some nice things -- in particular, it allows singletonization of functions with overlapping patterns. But is the cost too high? If the cost is high, is there a way to detect when match flattening is a good idea? Should we ask the user for assistance?

@jstolarek
Copy link
Collaborator

It looks like match flattening was the cause of huge memory usage during compilation of Data.Singletons.Prelude.List module. See #96.

@jstolarek
Copy link
Collaborator

It also seems that there might have been a bug in the implementation of match flattening, which caused failure of Singletons/Nat test. See #97.

goldfirere pushed a commit that referenced this issue Sep 16, 2015
This fails because takeWhile can't be singletonized,
which in turn is because of #113 and friends. Oh well.
goldfirere pushed a commit that referenced this issue Sep 16, 2015
This fails because takeWhile can't be singletonized,
which in turn is because of #113 and friends. Oh well.
@goldfirere
Copy link
Owner Author

There is lots of good data on this issue in #96. But it's not exactly the data that we need.

The big question I have about the performance blowup is where, precisely, it's happening.

First, we need to know what the trigger is. The data in #96 suggests that the blowup happens with both zipWithN and unzipN. unzipN is simpler, so let's focus on that. Question 1: why unzipN and not zipN?

Question 2: Where, precisely, is the blowup happening? Here are some possibilities:

  1. th-desugar takes a long time to process the code, but its output is reasonable.
  2. th-desugar produces lots and lots of code. Way more than it needs to. A good basis of comparison here is to see what GHC produces when it translates normal (that is, with no TH silliness) definitions of unzipN to Core. The match-flattening algorithm is copied directly out of GHC, so the output should be similar.
  3. singletons takes a very long time processing the output of th-desugar. By "very long", I mean worse than linear in the length of the th-desugar output (do not compare against the N in unzipN -- we need to look at the size of the AST that th-desugar produces).
  4. singletons produces lots and lots of code during promotion. Way more than it needs to. A good basis of comparison is to take the output of th-desugar and promote by hand. Is the trickery that singletons is doing to get promotion working producing the blowup? Or is it doing a reasonably efficient thing?
  5. singletons produces lots and lots of code during singletonization. Way more than it needs to. A good basis of comparison is to take the output of th-desugar and singletonize by hand. Is the trickery that singletons is doing to deal with binders producing the blowup? Or is it doing a reasonably efficient thing?
  6. GHC's typechecker is choking on the output from singletons. That is, GHC's performance is worse-than-linear in the size of the code passed to it. But the output of the typechecker (as seen by -ddump-ds) is reasonable. Does it matter if you write the code by hand vs. by TH? If you were to write unzipN in a natural way (without all the silliness that singletons does), is GHC still slow? Dealing with the promoted code or the singleton code? Can we isolate the possibility that the problem is simply type family reduction? (Type family reduction has been a bottleneck in the past.)
  7. GHC's translation to GHC code is choking. That is, the size of the -ddump-ds output is growing too fast. Way faster than it should.

Perhaps the blowup is happening at multiple places! But we need to discover which pass is causing problems before we can start addressing them. I don't think it's terribly hard to do this, but it would take some time and attention to details. Once we're sure that unzipN causes the problem, it may just be easiest to work with unzip7 and observe what happens manually -- I'm not convinced that finding the growth rate is going to unlock this problem as much as a higher-level understanding of what's wrong.

And then there's question 3: even when things aren't terrible, how bad is match-flattening in the average case? Does it increase all compilations by 20%? Code size? Is it worth the cost?

If you want to explore this, you'll want to enable match-flattening again. Take a look at commit 87aa901. Restore the Data.Singletons.Single.Squash module and the few lines of change in Data.Singletons.Single. There's a lot of other stuff in that commit, but don't worry about it -- none of it should matter for testing purposes.

Do let me know if you're taking this on! :)

@jstolarek
Copy link
Collaborator

Do let me know if you're taking this on! :)

Not in the next weeks - I'm totally swamped with teaching and once this is over (in two weeks) I plan to take on generalized injectivity.

@goldfirere
Copy link
Owner Author

To be clear: my "you" was addressed to any reader out there in cyberspace. Not you, Janek, specifically. :)

@jstolarek
Copy link
Collaborator

I admit that this possibility crossed my mind when writing my comment :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants