Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bypass cassandra streaming #837

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from

Conversation

arunagrawal84
Copy link
Contributor

Build the instance from backups by using the restore process in case of an instance replacement. Note that we prefer this when data size is HUGE. C* streaming is super slow, and for instances with big data size can lead to C* streaming for multiple days. Note that this is a little bit dangerous as you "will" lose writes accepted by the old instance but not uploaded to the backup file system. Also, we do not plan to run a local repair on the replaced instance, so data will be stale. We hope that the repair service will take care of the inconsistency. Clusters with LOCAL_QUORUM for reads and writes may see little to no impact. If the restore fails, then we fall back to use "streaming".

@codecov
Copy link

codecov bot commented Nov 12, 2019

Codecov Report

Merging #837 into 3.x will decrease coverage by 0.05%.
The diff coverage is 7.69%.

Impacted file tree graph

@@             Coverage Diff              @@
##                3.x     #837      +/-   ##
============================================
- Coverage     46.67%   46.62%   -0.06%     
+ Complexity     1054     1053       -1     
============================================
  Files           167      167              
  Lines          7315     7325      +10     
  Branches        746      748       +2     
============================================
+ Hits           3414     3415       +1     
- Misses         3650     3661      +11     
+ Partials        251      249       -2
Impacted Files Coverage Δ Complexity Δ
...c/main/java/com/netflix/priam/restore/Restore.java 83.33% <ø> (+11.9%) 3 <0> (ø) ⬇️
...etflix/priam/restore/EncryptedRestoreStrategy.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...re/AwsCrossAccountCryptographyRestoreStrategy.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...iam/restore/GoogleCryptographyRestoreStrategy.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...a/com/netflix/priam/identity/InstanceIdentity.java 80.27% <0%> (-0.55%) 22 <0> (ø)
...java/com/netflix/priam/restore/RestoreContext.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...m/src/main/java/com/netflix/priam/PriamServer.java 1.56% <0%> (-0.53%) 1 <0> (ø)
...ain/java/com/netflix/priam/tuner/dse/DseTuner.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...com/netflix/priam/config/IBackupRestoreConfig.java 14.28% <0%> (-2.39%) 1 <0> (ø)
...ava/com/netflix/priam/restore/AbstractRestore.java 50% <100%> (+1.82%) 9 <0> (-1) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4323b20...c000eaf. Read the comment docs.

Copy link
Contributor

@hashbrowncipher hashbrowncipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, at least some of them due to my unfamiliarity with Priam's codebase.

// no restores needed
logger.info("No restore needed, task not scheduled");
shouldStartCassandra = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we exit this block with shouldStartCassandra being false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original requested restore was a failure (that is when Priam starts in restore mode during weekly restore refresh). `` // Start cassandra only if restore is successful.
shouldStartCassandra = true;```

But yes with the recent refactoring of restore we will throw exception there and thus we don't need that variable. good catch.

shouldStartCassandra = true;
} else {
if (instanceIdentity.isReplace()
&& backupRestoreConfig.enableBypassCassandraStreaming()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Priam determine whether Cassandra hasn't successfully bootstrapped? I'm looked for an existing check, but I didn't see one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenRetrieverUtils.inferTokenOwnerFromGossip is used to fetch the instance identity. That method should tell correctly if Cassandra had already bootstrapped successfully.

* Note that we prefer this when data size is HUGE. C* streaming is super slow and for instances
* with big data size can lead to C* streaming for multiple days. Note that this is a little bit
* dangerous as you "will" some of the writes accepted by old instance but not uploaded to
* backup file system. Also we do not plan to run local repair on the replaced instance, so data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that not running repair is acceptable for a first iteration. Hypothetically though, how would we do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should be deferring that task to the repair service. Where that repair service sits, how it gets executed is a different conversation though.

if (!Restore.isRestoreEnabled(config, instanceInfo)) {
map.put("auto_bootstrap", config.getAutoBoostrap());
} else {
if (instanceState.getRestoreStatus() != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the purpose of this check. Why not just pass the auto_bootstrap setting through from the config 100% of the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea is - if we are doing a restore then we need to set auto_bootstrap to false else use the provided value in the configuration provider (like when creating a new cluster with false or true for most of the cases). Since we can choose to restore to bypass Cassandra streaming we need to override the configured value. I wanted to keep that logic in tuner instead of putting in a configuration provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we leave auto_bootstrap=true, even when we restore?

} else {
if (instanceIdentity.isReplace()
&& backupRestoreConfig.enableBypassCassandraStreaming()) {
logger.info("Trying to download data instead of streaming from Cassandra.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add "from backup", as in "Trying to download data instead of streaming from Cassandra"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants