Missing catalog name in CTAS #356

dev-goyal · 2024-03-12T23:56:59Z

Describe the bug

We have a simple DBT model as follows:

# model.yaml

version: 2

models:
  - name: matches
    description: RS
    config:
      materialized: 'table'
      incremental_strategy: 'insert_overwrite'
      file_format: iceberg
      glue_version: '4.0'
      glue_session_reuse: True
      iceberg_expire_snapshots: True
      custom_location: '{{target.location}}/matches'
    columns:
      - name: id
        type: varchar
      - name: rating
        type: integer

And the very simple matches.sql model

SELECT * FROM {{ source('matches_source', 'sourcetest') }}

However, it fails with

23:55:06  Completed with 1 error and 0 warnings:
23:55:06
23:55:06    Database Error in model matches (dbt_models/matches/matches.sql)
  Glue cursor returned `error` for statement None for code SqlWrapper2.execute('''/* {"app": "dbt", "dbt_version": "1.7.9", "profile_name": "ml_iceberg", "target_name": "dev", "node_id": "model.ml_dating_outcomes.m
atches"} */


      create table dating_outcomes.matches



      using iceberg




      location 's3://<<path>>/iceberg-tables/dating_outcomes/matches'

        as
        SELECT * FROM ml_features.sourcetest
    ''', use_arrow=False, location='s3://<<path>>/iceberg-tables/dating_outcomes'), Py4JJavaError: An error occurred while calling o81.sql.
  : org.apache.spark.SparkException: Table implementation does not support writes: dating_outcomes.matches
        at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTableWritesError(QueryExecutionErrors.scala:761)                                                                                                      at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:515)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1550)
        at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:491)
        at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:486)
        at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:68)
        at org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:92)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
        at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
...

I believe this is because the CTE is missing the catalog name (instead of dating_outcomes.matches, it should be glue_catalog.dating_outcomes.matches), and have confirmed the above works when we add it to the macro by overloading it.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Expected behavior

The table should have been created

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

DBT Profile:

config:
  partial_parse: False
  send_anonymous_usage_stats: False
  use_colors: False

ml_iceberg:
  target: dev # default target
  outputs:
    dev:
      type: glue
      query-comment: Glue on the dev account
      role_arn: <<role_arn>>
      region: us-east-1
      workers: 2
      worker_type: G.1X
      idle_timeout: 10
      schema: <<schema>>
      database: <<schema>>
      session_provisioning_timeout_in_seconds: 120
      location: s3://<<bucket>>/iceberg-tables/<<schema>>
      glue_version: "4.0"
      glue_session_id: dbt
      datalake_formats: iceberg
      conf: |
        spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
        --conf spark.sql.catalog.glue_catalog.warehouse=s3://<<location>>
        --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
        --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
        --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

glue version: 4
dbt 1.7.9

<output goes here>

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

vitoravancini · 2024-03-21T12:43:20Z

dbt glue with iceberg is not working at this moment? I was trying to configure it with iceberg and came to the same problem.
Is there any workaround?

hsirah · 2024-04-02T14:27:07Z

@vitoravancini I have worked around this by adding the catalog to the query like so - SELECT * FROM glue_catalog.{{ source('matches_source', 'sourcetest') }}

eshetben · 2024-06-04T10:16:03Z

adding --conf spark.sql.defaultCatalog=glue_catalog should help

see this - #359

dev-goyal added the bug Something isn't working label Mar 12, 2024

dev-goyal linked a pull request Mar 12, 2024 that will close this issue

Bugfix in macro #355

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing catalog name in CTAS #356

Missing catalog name in CTAS #356

dev-goyal commented Mar 12, 2024 •

edited

Loading

vitoravancini commented Mar 21, 2024

hsirah commented Apr 2, 2024

eshetben commented Jun 4, 2024

Missing catalog name in CTAS #356

Missing catalog name in CTAS #356

Comments

dev-goyal commented Mar 12, 2024 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

vitoravancini commented Mar 21, 2024

hsirah commented Apr 2, 2024

eshetben commented Jun 4, 2024

dev-goyal commented Mar 12, 2024 •

edited

Loading