Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Prevent column validation exceptions caused by Oracle CLOB JSON columns #1365

Merged
merged 8 commits into from
Dec 12, 2024

Conversation

nj1973
Copy link
Contributor

@nj1973 nj1973 commented Dec 4, 2024

This PR:

  • Adds Oracle/PostgreSQL JSON columns to the dvt_ora2pg_types test table.
  • Piggy backs existing string column validation code path for JSON columns, i.e. uses the string length. Not ideal but it is also not ideal to compare strings in this way. Especially min/max aggregations, they should be executed on the pure columns for non-LOB strings but that is covered by issue validate column: Character columns are skipped for min/max validations #758

I've created a new issue for row validation problems because we get exceptions for pure CLOB validations irrespective of JSON. #1364

@nj1973 nj1973 marked this pull request as ready for review December 4, 2024 16:30
@nj1973 nj1973 requested a review from a team as a code owner December 4, 2024 16:30
@helensilva14
Copy link
Collaborator

/gcbrun

Copy link
Collaborator

@helensilva14 helensilva14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestions and a question!

data_validation/config_manager.py Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Outdated Show resolved Hide resolved
data_validation/config_manager.py Show resolved Hide resolved
@nj1973
Copy link
Contributor Author

nj1973 commented Dec 10, 2024

/gcbrun

@sundar-mudupalli-work
Copy link
Collaborator

Neil,

I did some research on JSON columns in Oracle and Postgres. Oracle has multiple options for storing JSON data including a JSON type or by adding a constraint to a character type or BLOB. Postgres has json and jsonb datatype in addition to character types.

Summary: Proposed feature addition allows count validation of columns of any character type in Oracle against Postgres JSON columns. It does not support BLOB - which Oracle says could be used for JSON types. Validations other than count are not supported - which makes sense because JSON objects cannot be aggregated with min, max, avg etc.

I created tables in Oracle pso_data_validator.issue1365 , pso_data_validator.issue1365_1, pso_data_validator.issue1365_2 and pso_data_validator.issue1365_3. The first contains a CLOB data type, the second contains JSON data type, the third contains VARCHAR data type and the fourth contains BLOB data type. I created tables in Postgres pso_data_validator.issue1365, pso_data_validator.issue1365_1, pso_data_validator.issue1365_2. The first holds json data in character data type, the second has json and jsonb datatype and the third has columns with the same name with int data type for testing.

Here is what I found:

  1. Oracle JSON data type is not supported by the version (1.4.49) of SQLAlchemy we use. (simple query fails)
  2. Schema validation marked character data type and JSON datatype as incompatible between Oracle and PG.
  3. Column validation between Oracle CLOB and PG VARCHAR was successful on the develop branch
  4. Column validation between Oracle CLOB and PG jsonb threw an error - incompatible type in develop branch
  5. Column validation between Oracle CLOB/VARCHAR and PG jsonb was successful in the 1335-prevent-exceptions-caused-by-clob-json branch.
  6. Column validation between Oracle BLOB and PG jsonb threw an error in the 1335-prevent-exceptions-caused-by-clob-json branch.
  7. When column validation also included any aggregation other than count (sum, min etc) between Oracle CLOB and PG jsonb in the 1335-prevent-exceptions-caused-by-clob-json branch, it was quietly ignored

Develop branch examples:

data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
╒══════════════════════════╤═══════════════════╤══════════════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name          │ validation_type   │ source_table_name            │ source_column_name   │   source_agg_value │   target_agg_value │   pct_difference │ validation_status   │ run_id                               │
╞══════════════════════════╪═══════════════════╪══════════════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ sum__length__col_jsonb   │ Column            │ pso_data_validator.issue1365 │ length__col_jsonb    │                 87 │                  3 │         -96.5517 │ fail                │ 6a779c66-07b6-4f2c-993b-0c945f9dda9f │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__col_jsonb │ Column            │ pso_data_validator.issue1365 │ length__col_jsonb    │                  3 │                  3 │           0      │ success             │ 6a779c66-07b6-4f2c-993b-0c945f9dda9f │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ sum__length__col_json    │ Column            │ pso_data_validator.issue1365 │ length__col_json     │                 87 │                  3 │         -96.5517 │ fail                │ 6a779c66-07b6-4f2c-993b-0c945f9dda9f │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__col_json  │ Column            │ pso_data_validator.issue1365 │ length__col_json     │                  3 │                  3 │           0      │ success             │ 6a779c66-07b6-4f2c-993b-0c945f9dda9f │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count                    │ Column            │ pso_data_validator.issue1365 │                      │                  3 │                  3 │           0      │ success             │ 6a779c66-07b6-4f2c-993b-0c945f9dda9f │
╘══════════════════════════╧═══════════════════╧══════════════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365_1
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
Traceback (most recent call last):
...
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00932: inconsistent datatypes: expected - got CLOB
Help: https://docs.oracle.com/error-help/db/ora-00932/
[SQL: SELECT count(*) AS count, count(t0.col_json) AS count__col_json, count(t0.col_jsonb) AS count__col_jsonb 
FROM pso_data_validator.issue1365 t0]
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365_2
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
Traceback (most recent call last):
...
  File "/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/ibis/common/validators.py", line 312, in any_of
    raise IbisTypeError(
ibis.common.exceptions.IbisTypeError: argument passes none of the following rules: value(Int64(nullable=True),), value(Float64(nullable=True),), value(<class 'ibis.expr.datatypes.core.Decimal'>,), value(Boolean(nullable=True),)

1335-prevent-exceptions-caused-by-clob-json branch examples:

data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
╒══════════════════════════╤═══════════════════╤══════════════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name          │ validation_type   │ source_table_name            │ source_column_name   │   source_agg_value │   target_agg_value │   pct_difference │ validation_status   │ run_id                               │
╞══════════════════════════╪═══════════════════╪══════════════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ sum__length__col_jsonb   │ Column            │ pso_data_validator.issue1365 │ length__col_jsonb    │                 87 │                  3 │         -96.5517 │ fail                │ 883a9d9a-dfae-44e2-9a38-5b729f01221b │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count                    │ Column            │ pso_data_validator.issue1365 │                      │                  3 │                  3 │           0      │ success             │ 883a9d9a-dfae-44e2-9a38-5b729f01221b │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__col_jsonb │ Column            │ pso_data_validator.issue1365 │ length__col_jsonb    │                  3 │                  3 │           0      │ success             │ 883a9d9a-dfae-44e2-9a38-5b729f01221b │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ sum__length__col_json    │ Column            │ pso_data_validator.issue1365 │ length__col_json     │                 87 │                  3 │         -96.5517 │ fail                │ 883a9d9a-dfae-44e2-9a38-5b729f01221b │
├──────────────────────────┼───────────────────┼──────────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__col_json  │ Column            │ pso_data_validator.issue1365 │ length__col_json     │                  3 │                  3 │           0      │ success             │ 883a9d9a-dfae-44e2-9a38-5b729f01221b │
╘══════════════════════════╧═══════════════════╧══════════════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365_1
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
╒═══════════════════════════════════════╤═══════════════════╤══════════════════════════════╤════════════════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name                       │ validation_type   │ source_table_name            │ source_column_name             │   source_agg_value │   target_agg_value │   pct_difference │ validation_status   │ run_id                               │
╞═══════════════════════════════════════╪═══════════════════╪══════════════════════════════╪════════════════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ count                                 │ Column            │ pso_data_validator.issue1365 │                                │                  3 │                  3 │                0 │ success             │ 04531c4c-2c3e-4b5c-b923-0c43e7e465b3 │
├───────────────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__cast_string__col_jsonb │ Column            │ pso_data_validator.issue1365 │ length__cast_string__col_jsonb │                  3 │                  3 │                0 │ success             │ 04531c4c-2c3e-4b5c-b923-0c43e7e465b3 │
├───────────────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__cast_string__col_json  │ Column            │ pso_data_validator.issue1365 │ length__cast_string__col_json  │                  3 │                  3 │                0 │ success             │ 04531c4c-2c3e-4b5c-b923-0c43e7e465b3 │
╘═══════════════════════════════════════╧═══════════════════╧══════════════════════════════╧════════════════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365_2=pso_data_validator.issue1365_1
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
╒═══════════════════════════════════════╤═══════════════════╤════════════════════════════════╤════════════════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name                       │ validation_type   │ source_table_name              │ source_column_name             │   source_agg_value │   target_agg_value │   pct_difference │ validation_status   │ run_id                               │
╞═══════════════════════════════════════╪═══════════════════╪════════════════════════════════╪════════════════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ count__length__cast_string__col_json  │ Column            │ pso_data_validator.issue1365_2 │ length__cast_string__col_json  │                  3 │                  3 │                0 │ success             │ 913e8893-0960-468d-b893-7b4c4bca104d │
├───────────────────────────────────────┼───────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count__length__cast_string__col_jsonb │ Column            │ pso_data_validator.issue1365_2 │ length__cast_string__col_jsonb │                  3 │                  3 │                0 │ success             │ 913e8893-0960-468d-b893-7b4c4bca104d │
├───────────────────────────────────────┼───────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ count                                 │ Column            │ pso_data_validator.issue1365_2 │                                │                  3 │                  3 │                0 │ success             │ 913e8893-0960-468d-b893-7b4c4bca104d │
╘═══════════════════════════════════════╧═══════════════════╧════════════════════════════════╧════════════════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365_3=pso_data_validator.issue1365_1
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
Traceback (most recent call last):
...
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00932: inconsistent datatypes: expected - got BLOB
...
data-validation validate column -sc oracle21 -tc postgres -count='col_json,col_jsonb' -sum='col_json,col_jsonb,col_baa' -tbls=pso_data_validator.issue1365=pso_data_validator.issue1365_2
/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/snowflake/sqlalchemy/base.py:1068: SAWarning: The GenericFunction 'flatten' is already registered and is going to be overridden.
  functions.register_function("flatten", flatten)
Traceback (most recent call last):
...
ibis.common.exceptions.IbisTypeError: argument passes none of the following rules: value(Int64(nullable=True),), value(Float64(nullable=True),), value(<class 'ibis.expr.datatypes.core.Decimal'>,), value(Boolean(nullable=True),)

Copy link
Collaborator

@sundar-mudupalli-work sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nj1973 nj1973 merged commit b20b4dd into develop Dec 12, 2024
5 checks passed
@nj1973 nj1973 deleted the 1335-prevent-exceptions-caused-by-clob-json branch December 12, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent column validation exceptions caused by Oracle CLOB/NCLOB columns used for JSON
3 participants