Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backquotes to columns when executing dbt unit tests #868

Open
tshiomoto opened this issue Dec 10, 2024 · 6 comments
Open

Add backquotes to columns when executing dbt unit tests #868

tshiomoto opened this issue Dec 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@tshiomoto
Copy link

Describe the bug

I created using code like the following (Note that I used Japanese, that is non-ASCII character, for column names):

SELECT
  `Japanese column name1`,
  `Japanese column name2`,
  ...
FROM
  {{ source(schema_name, source_table_name) }}

I defined the schema like the following, and created the csv file for this unit test:

version: 2

sources:
  - name: schema_name
    tables:
      - name: table name

models:
  - name: target_table_name

unit_tests:
  - name: test_name
    model: target_table_name
    given:
      - input: source(schema_name, source_table_name)
        format: csv
        fixture: source_table_input
    expect: 
      format: csv
      fixture: target_table_output

After that, when I executed dbt run command, dbt runs failed with this error:

[INVALID_IDENTIFIER] The unquoted identifier __dbt__cte__target_table_name is invalid and must be back quoted as: `__dbt__cte__target_table_name`.
    Unquoted identifiers can only contain ASCII letters ('a' - 'z', 'A' - 'Z'), digits ('0' - '9'), and underbar ('_').
    Unquoted identifiers must also not start with a digit.
    Different data sources and meta stores may impose additional restrictions on valid identifiers. SQLSTATE: 42602 (line 7, pos 13)

I would appreciate you when you could tell me how I can solve this problem. However, I suppose that it's easiest way to change column names to english but I can't do it that way.

System information

The output of dbt --version:

Core:
  - installed: 1.8.6   
  - latest:    1.9.0 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.8.6 - Update available!
  - spark:      1.8.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using :windows 10 (22H2)

The output of python --version:python 3.9.19

Additional context

I found the similar issue. (#859)

@tshiomoto tshiomoto added the bug Something isn't working label Dec 10, 2024
@benc-db
Copy link
Collaborator

benc-db commented Dec 10, 2024

So, I think technically you are supposed to set this on the columns: https://docs.getdbt.com/reference/resource-properties/quote

However, I'm more and more feeling like we should just always backtick column names. @mikealfare, what are your thoughts on this?

@tshiomoto
Copy link
Author

Oh, thank you for giving me that URL! I didn't know that page existed. After setting this config, I try to execute test again.

@tshiomoto
Copy link
Author

I tried to execute test after setting it, but the same error occurred.
I fixed schema.yml like the following:

version: 2

sources:
  - name: schema_name
    tables:
      - name: table name
        columns:
        - name: Japanese column name1
          quote: true
        - name: Japanese column name2
          quote: true
        ...

models:
  - name: target_table_name

unit_tests:
  - name: test_name
    model: target_table_name
    given:
      - input: source(schema_name, source_table_name)
        format: csv
        fixture: source_table_input
    expect: 
      format: csv
      fixture: target_table_output

@benc-db
Copy link
Collaborator

benc-db commented Dec 11, 2024

Thanks for reporting and following up. I'm thinking I'll move towards just always quoting, since there are places where I may have access to column names but not the 'quote' information anymore. I'll add a behavior flag for this for 1.9.1.

@tshiomoto
Copy link
Author

Thank you for your response. I hope that a fixed version will be released.

@benc-db
Copy link
Collaborator

benc-db commented Dec 13, 2024

It doesn't look like I'll have this finished in time for 1.9.1, as my attempts to always backtick columns broke a bunch of tests. I'm still prioritizing this, but I don't think it will be ready until January. Apologies for any inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants