Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast three way join algorithm #265

Open
jmatsushita opened this issue Sep 4, 2020 · 8 comments
Open

Fast three way join algorithm #265

jmatsushita opened this issue Sep 4, 2020 · 8 comments

Comments

@jmatsushita
Copy link
Contributor

Hi there, I just watched this talk https://youtu.be/FudEqfutW2A?t=1888 from MSFP by @henglein and @mkm about how to use module theory to get a very fast implementation of three way joins which probably has more implications with regards to database performance (with a haskell implementation which I think isn't public yet).

In the q&a section there are questions about backporting this approach to RDMS, and generally make this result more visible to the database community, and I thought this might be an idea that would be a great fit for project-m36.

Hope this is interesting 😄

@agentm
Copy link
Owner

agentm commented Sep 5, 2020

Interesting, indeed! Thanks for the link. I have contacted the authors of the paper for more details.

@agentm
Copy link
Owner

agentm commented Sep 6, 2020

The paper can be found here.

The paper also mentions "Factorized Databases", a tree-based representations of joins which purports to provide an redundancy-free representation.

@agentm
Copy link
Owner

agentm commented Sep 6, 2020

Project:M36 already has a notion similar to factorized joins in nested relations. It may be interesting to implement a special form of join which generates nested relation results.

@agentm
Copy link
Owner

agentm commented Sep 7, 2020

After reviewing the paper and the video, it's not immediately clear but strongly implied that this optimization applies only to the "triangle join" query which finds triples that reference each other circularly. The video and paper also do not conclude whether this optimization applies to n-ary loop searches.

I'm still waiting to hear back from the authors, if only to receive the haskell code used in the benchmark.

@jmatsushita
Copy link
Contributor Author

What I understand is that the triangle join is the only experimental result so far, but that its possible that the intermediary representation would optimise other type of queries. I suspect that it hinges on the representation of negative intermediary values which allow terms to be eliminated without having to compute them, so I conjecture that it would apply to n-ary loops, though like you I havent seen it in the paper (which you will have noted is an extended abstract, so I imagine the authors are still working on it).

@mkm
Copy link

mkm commented Sep 7, 2020

It's great to see that our work is generating interest! As you have already surmised, it is indeed a work in progress, but I hope to have a publicly available prototype Soon™.

The current epistemic status is as follows:

  • We are sure that certain classes of queries, including cyclic joins, are done optimally.
  • We are confident, and in the process of proving, that all conjunctive queries are done optimally.
  • We speculate that even functional aggregate queries can be done optimally with suitable modifications.
    Here "optimal" refers to worst-case output optimality.

Note that it's not just a join algorithm, it's a query evaluation approach that happens to deal with join situations (whether explicit joins or cartesian products followed by filtering) efficiently. Of course, you can use it as a special-purpose method to deal with, say, three-way joins, but you won't reap the full benefits and the resulting query evaluator will probably still have some blind spots. Exactly how much of the module-theoretic approach can be grafted onto a relational algebra model remains to be seen, so I'm curious to see how far this goes. Depending on which flavour of relational algebra you are using, you might actually be working with semi-modules already!

@henglein
Copy link

henglein commented Sep 11, 2020 via email

@mkm
Copy link

mkm commented Oct 5, 2020

I now have a prototype publicly available at https://github.com/mkm/module-theory. There are examples of three-way joins as well as more complicated joins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants