-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Prevent Oracle LOB columns from causing exceptions in Column and Row validation #1339
Conversation
…onb-data-type-support
# Oracle LOB columns are invalid for use with SQL COUNT function. | ||
elif agg_type == "count" and self._is_oracle_lob(column): | ||
# TODO Eventually we need to allow COUNT(Oracle LOB) by using a CASE expression: | ||
# COUNT(CASE WHEN clob_col IS NULL THEN NULL ELSE 1 END) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why skip this column when you can change the query builder here , using the ibis.cases method. Isn't that a simpler and better solution ?
Thanks.
Sundar Mudupalli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can skip it in the query builder because I don't have data type information for both the source and target. I believe ConfigManager is where I have all of the information required to exclude a column from both source and target queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Based on your discussion during the sync call, is this comment going to be resolved as-is or there will be code changes related to it?
All good from my side and only @sundar-mudupalli-work's comment is left to be resolved. /gcbrun |
@nj1973 - CLA check is failing because of my former Google email not being found anymore, sorry about that. I will need to go through the commits I did with the corp email and remove it from the authors |
Co-authored-by: Helen Cristina <[email protected]>
Co-authored-by: Helen Cristina <[email protected]>
* tests: Check Oracle integration tests with Python 3.11
…1340) * feat: Support customer defined api endpoint for BigQuery and Spanner * feat: Support customer defined api endpoint for BigQuery result handler * Update data_validation/result_handlers/bigquery.py Co-authored-by: Helen Cristina <[email protected]> * chore: PR comments * chore: Reformat --------- Co-authored-by: Helen Cristina <[email protected]>
* Create SECURITY.md * chore: Switched SECURITY.md to Google recommended content --------- Co-authored-by: nj1973 <[email protected]>
Co-authored-by: Helen Cristina <[email protected]>
* Create CODE_OF_CONDUCT.md * Update CODE_OF_CONDUCT.md
* chore: Add sample files using DVT for object reconcilliation * Update samples/object_comparisons/README.md Co-authored-by: Helen Cristina <[email protected]> * Update samples/object_comparisons/reconcile.sh Co-authored-by: Helen Cristina <[email protected]> --------- Co-authored-by: Helen Cristina <[email protected]>
* docs: Refine connection secret support section
0fe64c4
to
b38dc87
Compare
/gcbrun |
Neil, This PR is a bit confusing. I thought it was ensuring that exceptions were not thrown when CLOBs and BLOBs are row and column validated. However I find that column validation with clobs and blobs works well. Column validations correctly count the number of non null columns I believe. This PR removes (count) column validations of LOB and BLOB columns. I believe min, max etc are also supported. These should likely stay as they are today. The issue is with row validations, which throw an error. It would be nice to fix this and put out an info message. This PR fixes this and only puts out an INFO message if the verbose option is selected. We also have another case where an INFO message is output (when verbose option is selected) when a column is removed. In both cases, I believe an INFO message should be output - not just when verbose option is selected. I also opened another Issue to remove the verbose option entirely. My general philosophy on validations (please feel free to disagree) is that the customer is trusting DVT to tell them when two tables are different. If we are excluding items from the validation - we should tell the customer explicitly that these are being excluded. We should not quietly exclude things and only disclose information when the verbose option is used. Thank you for working on this. Sundar Mudupalli |
I retested the test case in the issue (#1335) and I still get the same exception:
BLOB columns will not have thrown an exception in the past because they are skilled due to this code (which I have replaced in this PR):
|
/gcbrun |
Closing this PR because it is based on a misunderstanding of current behaviour. I'll start over and maybe cherry pick any parts of these changes I need for second attempt. |
Oracle LOB columns are pretty restrictive to use, for example:
In DVT we lose the true data type and work with Ibis canonical types, for example a CLOB column becomes
dt.String
, a BLOB column becomesdt.Binary
. This PR adds code to fetch the raw data types for Oracle tables/custom queries and excludes BLOB, CLOB and NCLOB from Column and Row (hash/concat) validations.I only added the
raw_metadata()
for Oracle because that is the only engines we currently need it for. If we run into similar issues for other engines where we need to differentiate between specific raw data types then we can extend this technique further.Ideally we would be able to add expressions to truly validate these types. I see that as a feature request rather than an exception avoidance issue which is what this one is.