Skip to content
This repository has been archived by the owner on Jul 23, 2023. It is now read-only.

Designing an universal & relational query engine #9

Open
billy1624 opened this issue May 6, 2022 · 0 comments
Open

Designing an universal & relational query engine #9

billy1624 opened this issue May 6, 2022 · 0 comments

Comments

@billy1624
Copy link
Member

Introduction

We want to design a query engine that can accept any representation of SQL query statement, including but not limited to GraphQL and SQL. Then, the engine will parse the input query and convert it into a subquery tree that can be executed in parallel. Finally, execute those subquery and consolidate the result of each to produce the final output.

Architecture

Query Parser

It will parse any types that implemented QueryParserTrait into the internal type QueryAST, which represents the query syntax tree.

By default, QueryParserTrait for GraphQL and SQL will be implemented. Of cause, QueryParserTrait can be implemented by any other foreign types as well.

QueryAST is a struct that have all the necessary information to reconstruct query statement on the target database or data storage, including...

  • SELECT
  • FROM
  • JOIN
  • WHERE
  • ORDER BY
  • GROUP BY
  • LIMIT
  • OFFSET

Query Planner

Split the QueryAST into a non-relational query tree that can be executed in parallel. And, come up with a strategy to consolidate those query results to form the final result.

If a query have subquery in it, those subquery would be split off and execute as a independent query. There are three place we could find subquery in a query, including...

  1. SELECT
  2. FROM
  3. WHERE

Also, we have to consider whether the subquery is correlated subquery or not.

  • Correlated subquery: the correlated part, in the following case, p1.productline, should be substituted with a concrete value and fetch one AVG(buyprice) for each products. Or, group by productline and match the list of AVG(buyprice) accordingly.
    SELECT productname,
        buyprice
    FROM products p1 --p1 was used in the subquery
    WHERE buyprice > (SELECT AVG(buyprice)
                    FROM products
                    WHERE productline = p1.productline)
  • Independent subquery: fetch the result of subquery and apply the filter or match it against parent query
    SELECT productname,
        buyprice
    FROM products
    WHERE buyprice > (SELECT AVG(buyprice) FROM products)

Query Executor

Execute the non-relational query tree in parallel and sort the result in a way that is easy to be consolidated as the final result.

How are we going to sort and aggregate the result efficiently? I think it's better to ask db engine to sort the result for us.

Brainstorming

  • Are we going to support cross database querying? E.g. query aggregated data from multiple databases. It can be homogeneous: between multiple database of the same kind, such as between two MySQL. Or, it can be heterogeneous: between multiple database of different kind, such as between MySQL and PostgreSQL.

CC @tyt2y3 @shpun817

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

1 participant