Support for Iceberg Partition Transforms #333

yaroslav-ost · 2024-02-05T14:16:02Z

Hi guys,

It seems we are not able to use Iceberg Partition Transforms inside dbt models.

Table Config:
{{ config(
materialized='incremental',
incremental_strategy='append',
file_format='iceberg',
partition_by=['months(my_date)']
) }}

Output:
CREATE TABLE glue_catalog.test.test_table
USING iceberg
PARTITIONED BY (months(my_date))
LOCATION 's3://sandbox-source/client/my/test/test_table'
AS SELECT * FROM tmp_test_table ORDER BY (months(my_date))

Error: AnalysisException: Undefined function: months. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.months.

I guess ORDER BY is the root cause since iceberg doesn't support such kind of partition expressions in ORDER BY clause.

dbt-glue/dbt/adapters/glue/impl.py

Lines 874 to 882 in a52fcc4

    
                       query = f""" 
        
                                   CREATE TABLE glue_catalog.{target_relation.schema}.{target_relation.name} 
        
                                   USING iceberg 
        
                                   PARTITIONED BY {partition_by} 
        
                                   LOCATION '{location}' 
        
                                   {table_properties} 
        
                                   AS SELECT * FROM tmp_{target_relation.name} ORDER BY {partition_by} 
        
                               """ 
        
                   return query

aiss93 · 2024-02-13T08:03:51Z

Hi,
I think the function you want to use is "month" not "months".

yaroslav-ost · 2024-02-13T15:32:32Z

Hi @aiss93,

It must be "months". See https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by:~:text=Supported%20transformations%20are%3A

aiss93 · 2024-02-16T10:31:42Z

These names are used in the older versions of iceberg. Check your iceberg version. The error message you're getting says that the function you're using wasn't found on the catalog.
I tried with "month" and I get the error : Transform is not supported

eshetben · 2024-06-04T10:02:38Z

from my experience, it seems that the problem is with pyspark -

there's the hour() function used for extracting the hour component - that one should be used within sql queries, including the order by clause.
https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.functions.hour.html
then there's the hours() function - used solely for partition transforms. should only be used in partition by obviously.
https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.functions.hours.html

dbt-glue does not support this since it uses the same function in both order by and partition by.

aiss93 · 2024-09-28T08:47:22Z

Hi guys,

I managed to fix this by including extra jars in the glue session in order to use an advanced version of iceberg.
Basically, glue 4.0 comes with iceberg 1.0.0. The partitioning transform functions are implemented in iceberg 1.2.0 and above (you can check this in the iceberg repository by searching in each tag)

You'll also need to include the aws bundle associated with the iceberg release you want to use.

Finally in order to respect some iceberg expectations, you'll need to cluster your transformation by the field you're using in the partirion_by clause.

So to summarize you need to change :

your_model.sql : add a CLUSTER BY my_date
profile.yaml file : replace datalake-formats by extra_jars: "_s3_path_to_your_iceberg_jar,s3_path_to_your_aws_bundle_"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Iceberg Partition Transforms #333

Support for Iceberg Partition Transforms #333

yaroslav-ost commented Feb 5, 2024

aiss93 commented Feb 13, 2024

yaroslav-ost commented Feb 13, 2024

aiss93 commented Feb 16, 2024

eshetben commented Jun 4, 2024

aiss93 commented Sep 28, 2024 •

edited

Loading

Support for Iceberg Partition Transforms #333

Support for Iceberg Partition Transforms #333

Comments

yaroslav-ost commented Feb 5, 2024

aiss93 commented Feb 13, 2024

yaroslav-ost commented Feb 13, 2024

aiss93 commented Feb 16, 2024

eshetben commented Jun 4, 2024

aiss93 commented Sep 28, 2024 • edited Loading

aiss93 commented Sep 28, 2024 •

edited

Loading