Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TD to BQ] DVT is not able to validate DECIMAL/NUMERIC fields #1372

Open
karcot1 opened this issue Dec 16, 2024 · 2 comments
Open

[TD to BQ] DVT is not able to validate DECIMAL/NUMERIC fields #1372

karcot1 opened this issue Dec 16, 2024 · 2 comments

Comments

@karcot1
Copy link

karcot1 commented Dec 16, 2024

Hello team,

We recently discovered that decimal/numeric fields between TD and BQ present and behave quite differently in their respective systems, which impacts DVT's ability to validate the data successfully.

The primary issue is in how TD and BQ store decimal values, specifically any value with a "0" in it.

Example 1: value 1.80

in TD this is stored as "1.80"
In BQ this is stored as "1.8"

When DVT performs a row hash, it converts both values to strings before generating a hash over the value. The hash for "1.8" and the hash for "1.80" are different, leading to a row hash fail 100% of the time.

Similar scenarios:

"0.5000" --> stored in TD as ".5000" and in BQ as "0.5"
"0" --> stored in TD as ".0" and in BQ as "0"

The situation becomes even more complex when these numeric fields are primary keys. In this scenario, we have identified a bug where not only does the row hash fail, but the comparison-fields validation also fails for every single column (regardless of whether it is a decimal column or not).

The only workaround we have identified so far to resolve this is quite messy - it involves manually casting every decimal value to a re-formatted string that matches exactly with how BQ handles numeric fields, and casting the BQ numeric to strings as well.

Example query in TD:

"""
SELECT
CASE
WHEN decimal_col=0 THEN '0'
WHEN decimal_col < 1 THEN '0' || trim(trailing '0' from cast(decimal_col as VARCHAR(100)))
WHEN decimal_col >= 1 and right(trim(trailing '0' from cast(decimal_col as VARCHAR(100))),1) = '.' THEN rtrim(trim(trailing '0' from cast(decimal_col as VARCHAR(100))),'.')
ELSE trim(trailing '0' from cast(decimal_col as VARCHAR(100)))
END AS decimal_col
"""

Example query in BQ:

"""
SELECT CAST(decimal_col) AS STRING AS decimal_col
"""

This case statement needs to be applied to every primary key in order for comparison-fields validations to work as expected, and every single decimal column in the table for row hash validations to work as expected.

Any ways to streamline this from DVT would be greatly appreciated! It's a bit of an edge case, but there are some tables that contain a large number of decimal columns. Performing this workaround for every single one can quickly become time intensive.

@sundar-mudupalli-work
Copy link
Collaborator

Kash,

How about using custom queries to format the decimal in Teradata use TO_CHAR doc here

data-validation query -c teradata -q 'Select TO_CHAR(0.5, '"'"'S000.000'"'"') '

and BQ use CAST doc here.

data-validation query -c bq -q 'Select cast(0.5 as STRING FORMAT '"'"'S000.000'"'"') '

This format will work for all numbers between -999 and 999 with up to 3 digits of precision. You can choose the format that works for your use case.

Sundar Mudupalli

@nj1973
Copy link
Contributor

nj1973 commented Dec 18, 2024

I'm interested in the cause of this because I believe we already have tests to cover this and the tests pass.

Our test table dvt_core_types has the following column in Teradata:

,   col_dec_10_2    NUMBER(10,2)

And has value 123.3 inserted for id 3.

The BigQuery table has this column with the same value for id 3:

,   col_dec_10_2    NUMERIC(10,2)

If I run a Teradata/BigQuery validation as below then the validation succeeds:

data-validation validate row -sc teradata -tc bq -tbls=pso_data_validator.dvt_core_types --concat='id,col_dec_10_2'
...
╒═══════════════════╤═══════════════════╤═════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name   │ validation_type   │ source_table_name   │ source_column_name   │   source_agg_value │   target_agg_value │ pct_difference   │ validation_status   │ run_id                               │
╞═══════════════════╪═══════════════════╪═════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ concat__all       │ Row               │ udf.dvt_core_types  │ concat__all          │            3123.3  │            3123.3  │                  │ success             │ 9ce951cb-4b56-463f-9ea6-244614a184e8 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ concat__all       │ Row               │ udf.dvt_core_types  │ concat__all          │            1123.11 │            1123.11 │                  │ success             │ 9ce951cb-4b56-463f-9ea6-244614a184e8 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ concat__all       │ Row               │ udf.dvt_core_types  │ concat__all          │            2123.22 │            2123.22 │                  │ success             │ 9ce951cb-4b56-463f-9ea6-244614a184e8 │
╘═══════════════════╧═══════════════════╧═════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛

Notice above the line for id 3:

│ concat__all          │            3123.3  │            3123.3  │

So I'm interested to know how our test case differs from your test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants