-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bypass cassandra streaming #837
base: 3.x
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## 3.x #837 +/- ##
============================================
- Coverage 46.67% 46.62% -0.06%
+ Complexity 1054 1053 -1
============================================
Files 167 167
Lines 7315 7325 +10
Branches 746 748 +2
============================================
+ Hits 3414 3415 +1
- Misses 3650 3661 +11
+ Partials 251 249 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions, at least some of them due to my unfamiliarity with Priam's codebase.
// no restores needed | ||
logger.info("No restore needed, task not scheduled"); | ||
shouldStartCassandra = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we exit this block with shouldStartCassandra being false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the original requested restore
was a failure (that is when Priam starts in restore mode during weekly restore refresh). ``
// Start cassandra only if restore is successful.
shouldStartCassandra = true;```
But yes with the recent refactoring of restore we will throw exception there and thus we don't need that variable. good catch.
shouldStartCassandra = true; | ||
} else { | ||
if (instanceIdentity.isReplace() | ||
&& backupRestoreConfig.enableBypassCassandraStreaming()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Priam determine whether Cassandra hasn't successfully bootstrapped? I'm looked for an existing check, but I didn't see one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TokenRetrieverUtils.inferTokenOwnerFromGossip is used to fetch the instance identity. That method should tell correctly if Cassandra had already bootstrapped successfully.
* Note that we prefer this when data size is HUGE. C* streaming is super slow and for instances | ||
* with big data size can lead to C* streaming for multiple days. Note that this is a little bit | ||
* dangerous as you "will" some of the writes accepted by old instance but not uploaded to | ||
* backup file system. Also we do not plan to run local repair on the replaced instance, so data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that not running repair is acceptable for a first iteration. Hypothetically though, how would we do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should be deferring that task to the repair service. Where that repair service sits, how it gets executed is a different conversation though.
if (!Restore.isRestoreEnabled(config, instanceInfo)) { | ||
map.put("auto_bootstrap", config.getAutoBoostrap()); | ||
} else { | ||
if (instanceState.getRestoreStatus() != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the purpose of this check. Why not just pass the auto_bootstrap
setting through from the config 100% of the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea is - if we are doing a restore then we need to set auto_bootstrap to false
else use the provided value in the configuration provider (like when creating a new cluster with false
or true
for most of the cases). Since we can choose to restore to bypass Cassandra streaming we need to override the configured value. I wanted to keep that logic in tuner instead of putting in a configuration provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we leave auto_bootstrap=true
, even when we restore?
} else { | ||
if (instanceIdentity.isReplace() | ||
&& backupRestoreConfig.enableBypassCassandraStreaming()) { | ||
logger.info("Trying to download data instead of streaming from Cassandra."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add "from backup", as in "Trying to download data instead of streaming from Cassandra"
Build the instance from backups by using the restore process in case of an instance replacement. Note that we prefer this when data size is HUGE. C* streaming is super slow, and for instances with big data size can lead to C* streaming for multiple days. Note that this is a little bit dangerous as you "will" lose writes accepted by the old instance but not uploaded to the backup file system. Also, we do not plan to run a local repair on the replaced instance, so data will be stale. We hope that the repair service will take care of the inconsistency. Clusters with LOCAL_QUORUM for reads and writes may see little to no impact. If the restore fails, then we fall back to use "streaming".