Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-19744 Accord migration and interop correctness #3395

Open
wants to merge 8 commits into
base: cep-15-accord
Choose a base branch
from

Conversation

aweisberg
Copy link
Contributor

@aweisberg aweisberg commented Jul 2, 2024

CASSANDRA-19744

There are several different fixes/changes here.

  • Batch log replay splits mutations based on whether they should be Accord/non-Accord routed
  • Failed batchlog replay of Accord txns are sent to hints as a special host ID and hints replays them and if necessary converts them back to per host hints, per host hints are never converted to Accord hints so the amplification is bounded in the rare case this occurs.
  • Hints split and retry Accord/non-Accord and handle Accord tens with no specific endpoint just for the batchlog
  • Batch log is skipped if the entire batch is routed to Accord, if the part of the batch is non-Accord then he entire batch goes through the batchlog
  • Barriers now return the precise Accord owned ranges that were barriered which may be a subset of what was requested
  • Repair now drives progress for migration based on what was actually barriered
  • Read repair routes the repair mutation based on whether the token is Accord managed, but does not retry if it fails. We just fail the read since this should be rare.
  • Regular mutations will loop splitting and retrying until timeout or until the routing is correct.
  • Mutations now have a flag that indicates whether they bypass validation for a conflict with a transaction system, Accord always sets this flag since it doesn't conflict with itself. Everything else doesn't and a CFS will reject a mutation that should have been managed by Accord. Counter and virtual tables don't use this since they don't work with Accord right now.
  • Timestamps of non-transactional writes are preserved in that the timestamp that is used by Accord to write data is the minimum of the Accord timestamp and the original coordinator generated timestamp to avoid data resurrection from batch and hints applied at a later time. CAS, Accord transactions, and Read Repair don't preserve timestamps.
  • Fixed handling of non-existent tables and tables that are system tables where we would NPE because not all schema sources contain non-distributed tables.
  • Fixed registration race conditions in TestChangeListener pausing enactment of epochs
  • Fixed cep-15-accord specific breakage of handling CoordinatorBehind responses
  • CoordinatorBehind errors no longer fail operations, the coordinator will retry

TODO CoordinatorBehind retry could retry operations that are not idempotent, but may have already succeeded
TODO Check that barriers of keys check the key was actually barriered
TODO The batchlog failed routing test doesn't actually experience failed routing it's also a timeout, but it should be possible to trigger a failed routing by doing a transaction on the key and then allowing the out of sync coordinator to catch up and be forced to execute the transaction in the new epoch.

…ally split and reroute if necessary

Mostly complete batch + hint support

checkpoint here

checkpoint again

accord branch

more checkpoint

More checkpoint

Checkpoint

checkpoint

checkpoint

fixes

Update .gitmodules, make CASMultiDCTest loopable
@@ -122,7 +122,7 @@ public long preAcceptTimeout()
@Override
public Txn emptyTxn(Kind kind, Seekables<?, ?> seekables)
{
return new Txn.InMemory(kind, seekables, TxnRead.EMPTY, TxnQuery.EMPTY, null);
return new Txn.InMemory(kind, seekables, TxnRead.EMPTY, TxnQuery.UNSAFE_EMPTY, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename this method to emptySystemTxn or emptyUnsafeTxn or something so people know that it doesn't offer all the normal guardrails

@@ -89,7 +89,7 @@ private CommitLogPosition addToCommitLog(Mutation mutation)
if (update.metadata().params.memtable.factory().writesShouldSkipCommitLog())
ids.add(update.metadata().id);
}
mutation = mutation.without(ids);
mutation = mutation.filter(ids::contains);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. If I'm reading this right then when we encounter mixed writesShouldSkipCommitLog mutations we'll only write commit log entries for tables which aren't supposed to be written to the commit log?

* like Accord that can't safely read data that is written non-transactionally.
*
*/
default boolean allowPotentialTransactionConflicts()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time parsing this comment
TODO: propose something easier to understand

@@ -140,7 +140,7 @@ public boolean isWorthMergingForRangeQuery(ReplicaCollection merged, ReplicaColl
}
Thread.sleep(1500);

Statement stmt = new SimpleStatement(String.format("SELECT DISTINCT token(id, id2), id, id2 FROM %s", table));
Statement stmt = new SimpleStatement(String.format("SELECT DISTINCT token(id, id2), id, id2 FROM %s WHERE token(id, id2) >= -9223372036854775808", table));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with the token bit you added here?

}

private int addMutations(int version, List<ByteBuffer> serializedMutations) throws IOException
private int addMutations(List<Mutation> unsplitMutations, int version, List<ByteBuffer> serializedMutations) throws IOException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this (and addMutation) can be made static if you pass writtenAt in

Mutation mutation = hint.mutation;
// Also may need to apply locally because it's possible this is from the batchlog
// and we never applied it locally
// TODO (review): Additional error handling necessary? Hints are lossy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, they're best effort

this.accordTxnStartNanos = accordTxnStartNanos;
this.accordTxnResult = accordTxnResult;
if (accordTxnResult != null)
accordTxnResult.right.addListener(this, ImmediateExecutor.INSTANCE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we register as a result/throwable consumer so we are notified if the txn fails unexpectedly?

@@ -112,6 +112,10 @@ public ConsistencyLevel commitCLForStrategy(ConsistencyLevel consistencyLevel)
return consistencyLevel;
}

// TODO (required): This won't work for migration directly from none to full because there is no safe system to read from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a migration from none to full effectively ran accord in mixed_reads mode during migration?

@@ -294,8 +329,20 @@ public void onFailure(InetAddressAndPort from, RequestFailure failure)
if (blockFor() + n > candidateReplicaCount())
signal();

if (hintOnFailure != null && StorageProxy.shouldHint(replicaPlan.lookup(from)))
StorageProxy.submitHint(hintOnFailure.get(), replicaPlan.lookup(from), null);
// If the failure was RETRY_ON_DIFFERENT_TRANSACTION_SYSTEM then we only want to hint once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we want to retry with accord instead of writing a hint?

throw new RetryOnDifferentSystemException();
}

public static boolean tokenShouldBeWrittenThroughAccord(@Nonnull ClusterMetadata cm, @Nonnull TableMetadata tm, @Nonnull Token token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be passing in the TableId and using it to retrieve the table metadata from the provided ClusterMetadata. Passing in table metadata directly like this makes it possible to read table params from a different epoch than the cm passed in, which I think is possible in invocation in BlockingReadRepair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants