-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TD to BQ] DVT is not able to validate DECIMAL/NUMERIC fields #1372
Comments
Kash, How about using custom queries to format the decimal in Teradata use TO_CHAR doc here
and BQ use CAST doc here.
This format will work for all numbers between -999 and 999 with up to 3 digits of precision. You can choose the format that works for your use case. Sundar Mudupalli |
I'm interested in the cause of this because I believe we already have tests to cover this and the tests pass. Our test table
And has value The BigQuery table has this column with the same value for id 3:
If I run a Teradata/BigQuery validation as below then the validation succeeds:
Notice above the line for id 3:
So I'm interested to know how our test case differs from your test case. |
Hello team,
We recently discovered that decimal/numeric fields between TD and BQ present and behave quite differently in their respective systems, which impacts DVT's ability to validate the data successfully.
The primary issue is in how TD and BQ store decimal values, specifically any value with a "0" in it.
Example 1: value 1.80
in TD this is stored as "1.80"
In BQ this is stored as "1.8"
When DVT performs a row hash, it converts both values to strings before generating a hash over the value. The hash for "1.8" and the hash for "1.80" are different, leading to a row hash fail 100% of the time.
Similar scenarios:
"0.5000" --> stored in TD as ".5000" and in BQ as "0.5"
"0" --> stored in TD as ".0" and in BQ as "0"
The situation becomes even more complex when these numeric fields are primary keys. In this scenario, we have identified a bug where not only does the row hash fail, but the comparison-fields validation also fails for every single column (regardless of whether it is a decimal column or not).
The only workaround we have identified so far to resolve this is quite messy - it involves manually casting every decimal value to a re-formatted string that matches exactly with how BQ handles numeric fields, and casting the BQ numeric to strings as well.
Example query in TD:
"""
SELECT
CASE
WHEN decimal_col=0 THEN '0'
WHEN decimal_col < 1 THEN '0' || trim(trailing '0' from cast(decimal_col as VARCHAR(100)))
WHEN decimal_col >= 1 and right(trim(trailing '0' from cast(decimal_col as VARCHAR(100))),1) = '.' THEN rtrim(trim(trailing '0' from cast(decimal_col as VARCHAR(100))),'.')
ELSE trim(trailing '0' from cast(decimal_col as VARCHAR(100)))
END AS decimal_col
"""
Example query in BQ:
"""
SELECT CAST(decimal_col) AS STRING AS decimal_col
"""
This case statement needs to be applied to every primary key in order for comparison-fields validations to work as expected, and every single decimal column in the table for row hash validations to work as expected.
Any ways to streamline this from DVT would be greatly appreciated! It's a bit of an edge case, but there are some tables that contain a large number of decimal columns. Performing this workaround for every single one can quickly become time intensive.
The text was updated successfully, but these errors were encountered: