[Feature] Dynamic Tables #706
-
DescriptionSnowflake has a special table called Dynamic Tables: https://docs.snowflake.com/en/user-guide/dynamic-tables-intro The Dynamic Tables is much like Materialized View but in Snowflake, A main difference is that Materialized View could be used to Rewrite Query in Snowflake, but Dynamic tables can't. And Materialized View could only has one simple table in Snowflake, that is exactly the matview candidate we could use for AQUMV currently(But we are planning to support complex querys have multiple tables or joins). Dynamic Tables are very useful for users in some cases. And if we have Dynamic Tables in CBDB, unlike Snowflake, I think it could be used to AQUMV too. Use case/motivationNo response Related issuesNo response Are you willing to submit a PR?
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
It's an interesting topic. |
Beta Was this translation helpful? Give feedback.
-
I think we can leverage the combination of these techniques, including materialized views for external tables, dynamic tables (auto refreshing materialized views) and AQUMV to solve the problem often raised by customers who are big fans of a lakehouse architecture: how can we run queries on external tables as fast as internal tables? A good example that comes to mind is this: we create a Foreign Table pointing to Apache Hive using FDW, and then create a Dynamic Table based on this Foreign Table that can be refreshed automatically (either periodically based on a time interval or triggered by listening to HMS CDC events). When a user queries the Foreign Table, we use AQUMV technology to replace the Foreign Table with the corresponding Dynamic Table, significantly improving query time. What's more, we should support setting distribution keys for dynamic tables, while foreign tables are commonly thought of as having a random distribution. More details need to be figured out, and I think this is a promising direction. |
Beta Was this translation helpful? Give feedback.
-
Another feature of Dynamic Tables in Snowflake is: users could specify the query used to transform the data from one or more base objects or dynamic tables. https://docs.snowflake.com/en/user-guide/dynamic-tables-about All of that could be supported, and we could do more: Besides base tables, foreign tables , dynamic tables, Dynamic Tables could be composed from materialized views as well. |
Beta Was this translation helpful? Give feedback.
-
I have been coding some functions, here are design details: DDL:Provide a base table CREATE TABLE t1(a int, b int, c int) distributed by (b);
INSERT INTO t1 SELECT i, i + 1, i + 2 FROM GENERATE_SERIES(1, 10) i;
INSERT INTO t1 SELECT i, i + 1, i + 2 FROM GENERATE_SERIES(1, 5) i; Create Dynamic Table:CREATE DYNAMIC TABLE dt0 SCHEDULE '5 * * * *' AS
SELECT a, b, sum(c) FROM t1 GROUP BY a, b WITH NO DATA DISTRIBUTED BY(b);
CREATE DYNAMIC TABLE
\d
List of relations
Schema | Name | Type | Owner | Storage
--------+------+---------------+---------+---------
public | dt0 | dynamic table | gpadmin | heap
public | t1 | table | gpadmin | heap
(2 rows) CREATE DYNAMIC TABLE xxx AS SCHEDULE:A string used to schedule background job which auto-refreshes the dynamic table.
You can also use '[1-59] seconds' to schedule a job based on an interval. User don't need to consider the auto-refresh job, however query on pg_task catalog if we want to see the task: SELECT * FROM pg_task WHERE jobname LIKE 'gp_dynamic_table_refresh%' AND command LIKE '%dt0';
jobid | schedule | command | nodename | nodeport | database | username | active |
jobname
-------+-----------+----------------------------------+-----------+----------+----------+----------+--------+---------
-----------------------
17398 | 5 * * * * | REFRESH DYNAMIC TABLE public.dt0 | 127.0.0.1 | 9000 | gpadmin | gpadmin | t | gp_dynam
ic_table_refresh_17394
(1 row) And a function pg_get_dynamic_table_schedule is provided for users to see the SCHEDULE info easily: CREATE DYNAMIC TABLE dt_schedule SCHEDULE '1 2 3 4 5' AS SELECT * FROM t2;
SELECT pg_catalog.pg_get_dynamic_table_schedule('dt_schedule'::regclass::oid);
pg_get_dynamic_table_schedule
-------------------------------
1 2 3 4 5
(1 row) As Snowflake, Dynamic Tables should always have a auto-refresh process. WITH NO DATA:Same as Materialized View, will create an empty Dynamic Table if specified. DISTRIBUTED BY:Same as normal tables in CBDB, Dynamic Tables could support distribution keys as materialized views. \d+ dt0;
Dynamic table "public.dt0"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | integer | | | | plain | | |
sum | bigint | | | | plain | | |
View definition:
SELECT t1.a,
t1.b,
sum(t1.c) AS sum
FROM t1
GROUP BY t1.a, t1.b;
Distributed by: (b)
Access method: heap Refresh Dynamic TableAs seen in pg_task, we put a command to auto-refresh dynamic tables. REFRESH DYNAMIC TABLE dt0;
REFRESH DYNAMIC TABLE REFRESH WITH NO DATA;Same as Materialized Views, Refresh with no data will truncate the Dynamic Table and make it unpopulated status. REFRESH DYNAMIC TABLE dt0 WITH NO DATA;
REFRESH DYNAMIC TABLE Drop Dynamic Table:DROP DYNAMIC TABLE dt0;
DROP DYNAMIC TABLE Drop a Dynamic Table will drop its scheduler job automatically. SELECT * FROM pg_task WHERE jobname LIKE 'gp_dynamic_table_refresh%' AND command LIKE '%dt0';
jobid | schedule | command | nodename | nodeport | database | username | active | jobname
-------+----------+---------+----------+----------+----------+----------+--------+---------
(0 rows) PrivilegesSame as Materialized Views in CBDB: \z
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+------+---------------+-------------------+-------------------+----------
public | dt1 | dynamic table | | |
public | t1 | table | | |
(2 rows) Use Dynamic Tables to answer queryLike Materialized Views, Dynamic Tables could be used to answer query too: CREATE DYNAMIC TABLE dt1 AS
SELECT * FROM t1 WHERE a = 1 DISTRIBUTED BY(b);
ANALYZE dt1;
SELECT 2 SET enable_answer_query_using_materialized_views = ON;
EXPLAIN(COSTS OFF, VERBOSE)
SELECT * FROM t1 WHERE a = 1;
SELECT * FROM t1 WHERE a = 1;
SET
QUERY PLAN
----------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: a, b, c
-> Seq Scan on public.dt1
Output: a, b, c
Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off'
Optimizer: Postgres query optimizer
(6 rows)
a | b | c
---+---+---
1 | 2 | 3
1 | 2 | 3
(2 rows)
We will enable that after #702 is merged, with Dynamic Tables feature. |
Beta Was this translation helpful? Give feedback.
-
Another thing is: do we need a ALTER DYNAMIC TABLE command? It may be used to: SUSPEND/RESUME(Snowflake grammar, pg_cron has ACTIVE/NOT ACTIVE instead) the background auto-refreshing job of Dynamic Table However, users could manage the task directly using |
Beta Was this translation helpful? Give feedback.
-
See codes in PR #725 |
Beta Was this translation helpful? Give feedback.
See codes in PR #725