[Feature] Support partition pushdown in Flink connector #196

wuchong · 2024-12-16T02:58:16Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

Partition pushdown is a performance optimization technique that allows the query engine to filter out unnecessary data early in the query processing pipeline. By pushing down partition filters, we can significantly reduce the amount of data transferred and processed, leading to improved query performance and resource efficiency.

Consider a scenario where a user queries a large dataset partitioned by region and date. Without partition pushdown, the entire dataset needs to be scanned, which is inefficient. With partition pushdown, only the relevant partitions (e.g., data for a specific region and date range) are scanned, resulting in faster query execution and reduced resource usage.

Solution

This may be blocked by [Feature] Support multiple partitioned fields for partitioned table #195
Extend the current FlinkTableSource to implement SupportsPartitionPushDown and pushdown partitions.
FlinkSourceEnumerator only discovers buckets for the specific partitions to read.

Anything else?

No response

Willingness to contribute

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

Alibaba-HZY · 2024-12-16T06:40:02Z

I would like to contribute this issue, can assign it to me

Alibaba-HZY · 2024-12-16T07:36:08Z

now flussAdmin.listPartitionInfos only return the partition value, missing partition key. Flink require return a map contains partition key and value.so it's must be blocked by #195

wuchong · 2024-12-16T09:10:15Z

@Alibaba-HZY you can get the partition keys from the Table#getDescriptor(). You can implement partition pushdown for single partition key first, so it is not blocked by #195.

For multiple partition pushdown, yes, we need to extend Admin#listPartitionInfos to return a map of partition keys and values. That can be done in/after #195.

Alibaba-HZY · 2024-12-18T09:51:57Z

@wuchong partition pushdown be executed only in batch mode?if so now batch mode only support datalake enabled or point queries on primary key.

Alibaba-HZY · 2024-12-19T12:15:27Z

After the discussion with @luoyuxia
in stream mode :we cannot determine the correct number of partitions such as table has three partitions ds=11 ds=12 ds=13, sql :select * from table where ds > 10, applyPartition input ds=11 ds=12 ds=13,then sourceEnumerator will recieve this, but the user might write ds=14.sourceEnumerator will not get ds=14.
in batch mode :only support datalake enabled or point queries on primary key,#40
is not closed.
So I think the partitionPushDown will not take effect

wuchong · 2024-12-20T08:53:26Z

@Alibaba-HZY yes. for batch mode, we have to wait #40, and for streaming mode, we need to leverage SupportsFilterPushDown instead of SupportsPartitionPushDown. For example, if there is a where ds > 10, then you don't need to read partition from 0 to 10.

wuchong added feature New feature or request component=flink labels Dec 16, 2024

wuchong added this to the v0.6 milestone Dec 16, 2024

wuchong changed the title ~~[Feature] Support partition pushdown for Flink connector~~ [Feature] Support partition pushdown in Flink connector Dec 16, 2024

wuchong assigned Alibaba-HZY Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support partition pushdown in Flink connector #196

[Feature] Support partition pushdown in Flink connector #196

wuchong commented Dec 16, 2024

Alibaba-HZY commented Dec 16, 2024

Alibaba-HZY commented Dec 16, 2024

wuchong commented Dec 16, 2024

Alibaba-HZY commented Dec 18, 2024

Alibaba-HZY commented Dec 19, 2024

wuchong commented Dec 20, 2024

[Feature] Support partition pushdown in Flink connector #196

[Feature] Support partition pushdown in Flink connector #196

Comments

wuchong commented Dec 16, 2024

Search before asking

Motivation

Solution

Anything else?

Willingness to contribute

Alibaba-HZY commented Dec 16, 2024

Alibaba-HZY commented Dec 16, 2024

wuchong commented Dec 16, 2024

Alibaba-HZY commented Dec 18, 2024

Alibaba-HZY commented Dec 19, 2024

wuchong commented Dec 20, 2024