-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent hashing: consistently retry another host #119
Conversation
This implements a retry policy to consistently retry another host within the SRV set if we get a failure talking to the original host. We still use the original fallback strategy if this retry fails. (Should we?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve with the caveat that I have not tested the code directly in the validation environment. It looks sane. I largely agree with @nickstenning 's comments.
This extracts the consistent hashing code into a new package and writes more tests for it. This removes a bunch of duplication from ConsistentHashingMode. As a consequence this jumbles the consistent hashing algorithm (because I've embedded the Key into a top-level CacheKey struct which includes Attempt for retry support). At this stage this is safe because nothing live is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I still have not tested in the validation environment. But I am willing to land this and iterate on further releases as we need for minor adjustments/critical oversights.
func HashBucket(key any, buckets int, previousBuckets ...int) (int, error) { | ||
if len(previousBuckets) >= buckets { | ||
return -1, fmt.Errorf("No more buckets left: %d buckets available but %d already attempted", buckets, previousBuckets) | ||
} | ||
// we set IgnoreZeroValue so that we can add fields to the hash key | ||
// later without breaking things. | ||
// note that it's not safe to share a HashOptions so we create a fresh one each time. | ||
hashopts := &hashstructure.HashOptions{IgnoreZeroValue: true} | ||
hash, err := hashstructure.Hash(cacheKey{Key: key, Attempt: len(previousBuckets)}, hashstructure.FormatV2, hashopts) | ||
if err != nil { | ||
return -1, fmt.Errorf("error calculating hash of key: %w", err) | ||
} | ||
|
||
// jump is an implementation of Google's Jump Consistent Hash. | ||
// | ||
// See http://arxiv.org/abs/1406.2294 for details. | ||
bucket := int(jump.Hash(hash, buckets-len(previousBuckets))) | ||
slices.Sort(previousBuckets) | ||
for _, prev := range previousBuckets { | ||
if bucket >= prev { | ||
bucket++ | ||
} | ||
} | ||
return bucket, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so nice to have it extracted!
if err != nil { | ||
// return origErr so that we can use our regular fallback strategy | ||
return nil, origErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to want this for the foreseeable future. As reliability is the real goal, I want to be durable for "cache disappears" but we still serve inference.
That said, I hear the earlier comments about "do we need this".
// retry, you can pass previousBuckets, which indicates buckets which must be | ||
// avoided in the output. HashBucket will modify the previousBuckets slice by | ||
// sorting it. | ||
func HashBucket(key any, buckets int, previousBuckets ...int) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. I like the derived/wrapped key a lot.
This implements a retry policy to consistently retry another host within the SRV set if we get a failure talking to the original host.
We still use the original fallback strategy if this retry fails. (Should we?)