-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing behavior where we bail on all batches if one fails #1019
base: master
Are you sure you want to change the base?
Fixing behavior where we bail on all batches if one fails #1019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor changes. Understand the change now and old comments were not relevant.
@@ -59,6 +59,7 @@ type CortexMetricSink struct { | |||
} | |||
|
|||
var _ sinks.MetricSink = (*CortexMetricSink)(nil) | |||
var DoIfNotDoneError = errors.New("context finished before completing metrics flush") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New also records the stack trace at the point it was called
errors.New should not be cached for usages
If error checking is necessary, we should introduce an error type and use errors.Is
s.logger.Error(err) | ||
droppedMetrics += len(metrics) - flushedMetrics | ||
// Any other type of error will happen inside of doIfNotDone | ||
// so we only need ot handle context cancellations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// so we only need ot handle context cancellations | |
// so we only need to handle context cancellations |
#1022 does the ?same? |
Summary
No longer erring out if one batch fails
Motivation
We want as many metrics to try and get written, errors may be related to things like writing duplicate time series which shouldn't stop the entire loop.
Test plan
Rollout/monitoring/revert plan