bypass cassandra streaming #837

arunagrawal84 · 2019-11-12T22:00:21Z

Build the instance from backups by using the restore process in case of an instance replacement. Note that we prefer this when data size is HUGE. C* streaming is super slow, and for instances with big data size can lead to C* streaming for multiple days. Note that this is a little bit dangerous as you "will" lose writes accepted by the old instance but not uploaded to the backup file system. Also, we do not plan to run a local repair on the replaced instance, so data will be stale. We hope that the repair service will take care of the inconsistency. Clusters with LOCAL_QUORUM for reads and writes may see little to no impact. If the restore fails, then we fall back to use "streaming".

codecov · 2019-11-12T22:15:25Z

Codecov Report

Merging #837 into 3.x will decrease coverage by 0.05%.
The diff coverage is 7.69%.

@@             Coverage Diff              @@
##                3.x     #837      +/-   ##
============================================
- Coverage     46.67%   46.62%   -0.06%     
+ Complexity     1054     1053       -1     
============================================
  Files           167      167              
  Lines          7315     7325      +10     
  Branches        746      748       +2     
============================================
+ Hits           3414     3415       +1     
- Misses         3650     3661      +11     
+ Partials        251      249       -2

Impacted Files	Coverage Δ	Complexity Δ
...c/main/java/com/netflix/priam/restore/Restore.java	`83.33% <ø> (+11.9%)`	`3 <0> (ø)`	⬇️
...etflix/priam/restore/EncryptedRestoreStrategy.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...re/AwsCrossAccountCryptographyRestoreStrategy.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...iam/restore/GoogleCryptographyRestoreStrategy.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...a/com/netflix/priam/identity/InstanceIdentity.java	`80.27% <0%> (-0.55%)`	`22 <0> (ø)`
...java/com/netflix/priam/restore/RestoreContext.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...m/src/main/java/com/netflix/priam/PriamServer.java	`1.56% <0%> (-0.53%)`	`1 <0> (ø)`
...ain/java/com/netflix/priam/tuner/dse/DseTuner.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...com/netflix/priam/config/IBackupRestoreConfig.java	`14.28% <0%> (-2.39%)`	`1 <0> (ø)`
...ava/com/netflix/priam/restore/AbstractRestore.java	`50% <100%> (+1.82%)`	`9 <0> (-1)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4323b20...c000eaf. Read the comment docs.

hashbrowncipher

A few questions, at least some of them due to my unfamiliarity with Priam's codebase.

hashbrowncipher · 2019-11-13T15:43:36Z

priam/src/main/java/com/netflix/priam/PriamServer.java

+                // no restores needed
+                logger.info("No restore needed, task not scheduled");
+                shouldStartCassandra = true;
+            }
        }


How can we exit this block with shouldStartCassandra being false?

If the original requested restore was a failure (that is when Priam starts in restore mode during weekly restore refresh). `` // Start cassandra only if restore is successful.
shouldStartCassandra = true;```

But yes with the recent refactoring of restore we will throw exception there and thus we don't need that variable. good catch.

hashbrowncipher · 2019-11-13T15:47:42Z

priam/src/main/java/com/netflix/priam/PriamServer.java

+            shouldStartCassandra = true;
+        } else {
+            if (instanceIdentity.isReplace()
+                    && backupRestoreConfig.enableBypassCassandraStreaming()) {


How does Priam determine whether Cassandra hasn't successfully bootstrapped? I'm looked for an existing check, but I didn't see one.

TokenRetrieverUtils.inferTokenOwnerFromGossip is used to fetch the instance identity. That method should tell correctly if Cassandra had already bootstrapped successfully.

priam/src/main/java/com/netflix/priam/config/IBackupRestoreConfig.java

hashbrowncipher · 2019-11-13T15:50:02Z

priam/src/main/java/com/netflix/priam/config/IBackupRestoreConfig.java

+     * Note that we prefer this when data size is HUGE. C* streaming is super slow and for instances
+     * with big data size can lead to C* streaming for multiple days. Note that this is a little bit
+     * dangerous as you "will" some of the writes accepted by old instance but not uploaded to
+     * backup file system. Also we do not plan to run local repair on the replaced instance, so data


I agree that not running repair is acceptable for a first iteration. Hypothetically though, how would we do it?

Ideally, we should be deferring that task to the repair service. Where that repair service sits, how it gets executed is a different conversation though.

hashbrowncipher · 2019-11-13T15:52:00Z

priam/src/main/java/com/netflix/priam/tuner/StandardTuner.java

-        if (!Restore.isRestoreEnabled(config, instanceInfo)) {
-            map.put("auto_bootstrap", config.getAutoBoostrap());
-        } else {
+        if (instanceState.getRestoreStatus() != null


I don't see the purpose of this check. Why not just pass the auto_bootstrap setting through from the config 100% of the time?

Idea is - if we are doing a restore then we need to set auto_bootstrap to false else use the provided value in the configuration provider (like when creating a new cluster with false or true for most of the cases). Since we can choose to restore to bypass Cassandra streaming we need to override the configured value. I wanted to keep that logic in tuner instead of putting in a configuration provider.

Can't we leave auto_bootstrap=true, even when we restore?

sumanth-pasupuleti · 2019-11-25T02:47:44Z

priam/src/main/java/com/netflix/priam/PriamServer.java

+        } else {
+            if (instanceIdentity.isReplace()
+                    && backupRestoreConfig.enableBypassCassandraStreaming()) {
+                logger.info("Trying to download data instead of streaming from Cassandra.");


nit: Add "from backup", as in "Trying to download data instead of streaming from Cassandra"

bypass cassandra streaming

c000eaf

arunagrawal84 requested review from hashbrowncipher, jolynch, vinaykumarchella and sumanth-pasupuleti November 12, 2019 22:00

hashbrowncipher reviewed Nov 13, 2019

View reviewed changes

sumanth-pasupuleti approved these changes Nov 25, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bypass cassandra streaming #837

bypass cassandra streaming #837

arunagrawal84 commented Nov 12, 2019

codecov bot commented Nov 12, 2019

hashbrowncipher left a comment

hashbrowncipher Nov 13, 2019

arunagrawal84 Nov 13, 2019

hashbrowncipher Nov 13, 2019

arunagrawal84 Nov 13, 2019

hashbrowncipher Nov 13, 2019

arunagrawal84 Nov 13, 2019

hashbrowncipher Nov 13, 2019

arunagrawal84 Nov 13, 2019

hashbrowncipher Nov 13, 2019

sumanth-pasupuleti Nov 25, 2019

bypass cassandra streaming #837

Are you sure you want to change the base?

bypass cassandra streaming #837

Conversation

arunagrawal84 commented Nov 12, 2019

codecov bot commented Nov 12, 2019

Codecov Report

hashbrowncipher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment