Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seamlessly propagate schema changes made after pipelines starts running for Spanner Change Streams to BigQuery template #1678

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ChangyuLi28
Copy link
Contributor

This pr supports schema updates after pipeline starts running.

  • Before you add a new column to an existing tracked Cloud Spanner table, first add the column to the BigQuery changelog table. The new column must be NULLABLE. Then add the column to the Cloud Spanner table. The new column is automatically populated when the pipeline receives a new record with that column. It's recommended to wait for a couple of minutes after making schema updates in BigQuery to start the load including the new column because of this issue.
  • To add a new table, add the table in the Cloud Spanner database first. The table is automatically created when the pipeline receives a record for the new table.
  • The template doesn't drop tables or columns from BigQuery. If a column is dropped from the Cloud Spanner table, then null values are populated to these changelog columns for records generated after the columns are dropped from the Cloud Spanner table, unless you manually drop the column from BigQuery.
  • The template doesn't support column type updates or column nullable mode updates.

Follow this guide to deploy pipelines with this template. More specific instructions.

Copy link

google-cla bot commented Jun 21, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@ChangyuLi28
Copy link
Contributor Author

ChangyuLi28 commented Jun 21, 2024

Integration tests will be added in a future PR. Do not merge now.

@ChangyuLi28 ChangyuLi28 marked this pull request as draft June 21, 2024 17:35
@ShuranZhang ShuranZhang force-pushed the main branch 2 times, most recently from ac239fc to c8c5336 Compare July 16, 2024 01:51
@ChangyuLi28 ChangyuLi28 marked this pull request as ready for review July 17, 2024 17:17
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 51.20275% with 142 lines in your changes missing coverage. Please review.

Project coverage is 42.92%. Comparing base (6b094c2) to head (1e87547).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
...tobigquery/schemautils/SpannerToBigQueryUtils.java 53.52% 24 Missing and 9 partials ⚠️
...eamstobigquery/SpannerChangeStreamsToBigQuery.java 0.00% 28 Missing ⚠️
...reamstobigquery/schemautils/SchemaUpdateUtils.java 60.00% 16 Missing and 6 partials ⚠️
...hangestreamstobigquery/schemautils/TypesUtils.java 58.33% 17 Missing and 3 partials ⚠️
...erchangestreamstobigquery/model/ModColumnType.java 33.33% 18 Missing ⚠️
...bigquery/FailsafeModJsonToTableRowTransformer.java 0.00% 12 Missing ⚠️
...streamstobigquery/BigQueryDynamicDestinations.java 0.00% 4 Missing ⚠️
...ates/spannerchangestreamstobigquery/model/Mod.java 62.50% 2 Missing and 1 partial ⚠️
...gestreamstobigquery/model/TrackedSpannerTable.java 83.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1678      +/-   ##
============================================
+ Coverage     42.83%   42.92%   +0.09%     
- Complexity     3424     3453      +29     
============================================
  Files           824      827       +3     
  Lines         48147    48275     +128     
  Branches       5168     5196      +28     
============================================
+ Hits          20624    20724     +100     
- Misses        25847    25864      +17     
- Partials       1676     1687      +11     
Components Coverage Δ
spanner-templates 62.81% <ø> (-0.01%) ⬇️
spanner-import-export 63.88% <ø> (-0.03%) ⬇️
spanner-live-forward-migration 75.05% <ø> (ø)
spanner-live-reverse-replication 67.63% <ø> (ø)
spanner-bulk-migration 83.75% <ø> (ø)
Files with missing lines Coverage Δ
...ngestreamstobigquery/schemautils/OptionsUtils.java 61.90% <ø> (-1.74%) ⬇️
...igquery/schemautils/SpannerChangeStreamsUtils.java 80.08% <100.00%> (+9.18%) ⬆️
...gestreamstobigquery/model/TrackedSpannerTable.java 61.53% <83.33%> (+13.39%) ⬆️
...ates/spannerchangestreamstobigquery/model/Mod.java 48.07% <62.50%> (+48.07%) ⬆️
...streamstobigquery/BigQueryDynamicDestinations.java 0.00% <0.00%> (ø)
...bigquery/FailsafeModJsonToTableRowTransformer.java 0.00% <0.00%> (ø)
...erchangestreamstobigquery/model/ModColumnType.java 33.33% <33.33%> (ø)
...hangestreamstobigquery/schemautils/TypesUtils.java 58.33% <58.33%> (ø)
...reamstobigquery/schemautils/SchemaUpdateUtils.java 60.00% <60.00%> (ø)
...eamstobigquery/SpannerChangeStreamsToBigQuery.java 0.00% <0.00%> (ø)
... and 1 more

... and 2 files with indirect coverage changes


// For "INSERT" mod, we can get all columns from mod.
if (mod.getModType() == ModType.INSERT) {
// For "DELETE" mod, we can simply populate null for all tracking columns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For "DELETE" mod, we only set the key columns. For all non-key columns, we already populated "null".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the comment.

@@ -181,7 +183,7 @@ public void setUp() {
"Caught exception when setting up FailsafeModJsonToTableRowFn, message: %s,"
+ " cause: %s",
Optional.ofNullable(e.getMessage()), e.getCause()));
seenException = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we still want the first error message to be printed at later stage when process the element here.

@nancyxu123
Copy link

@ShuranZhang could you also add thiago as reviewer?

Copy link
Contributor

@thiagotnunes thiagotnunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few suggestions

@@ -453,7 +476,7 @@ private static DeadLetterQueueManager buildDlqManager(
? tempLocation + "dlq/"
: options.getDeadLetterQueueDirectory();

LOG.info("Dead letter queue directory: {}", dlqDirectory);
LOG.info("Dead letter queue directory: {}" + dlqDirectory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be as it was ({} is replaced by the parameter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

this.modType = modType;
this.valueCaptureType = valueCaptureType;
this.numberOfRecordsInTransaction = numberOfRecordsInTransaction;
this.numberOfPartitionsInTransaction = numberOfPartitionsInTransaction;
this.rowTypeAsMap = rowType == null ? Collections.emptyMap() : new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't we always assign a new HashMap<>? Or do you want it to be unmodifiable when rowType is null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure on the intention as I was not the original author here.
@ChangyuLi28 Could you please take a look here? Thanks!

@@ -0,0 +1,43 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we would need more tests here to improve the coverage of TypesUtils since it does a lot of String manipulation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added tests to cover all string manipulation branches inTypesUtils.java .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants