Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds retry support for submitting metrics. It’s rare, but at scale most users have probably encountered some failures to send metrics (although they might not have noticed it in their logs). Fixes #108. This is the last major item for v0.13.0!
There are a bunch of ways to approach retries, but they fall into two main categories:
I took approach 1 here, since it was much simpler to implement in the current architecture. For long running services (e.g. a web server), approach 2 is probably slightly better, but after some brief experimentation, it felt like implementing it required a lot of coordination work between the reporter and logger about what failures are retryable, and new tooling for measuring the size of a metric object to manage the queue size. We might still want to do it in the future, but I wanted to keep things a little simpler for now.
The underlying
@datadog/datadog-api-client
library added built-in retry support in v1.17.0, but it turns out to only retry on HTTP errors (when the server responds with a >= 400 status code) and not on network errors (e.g. broken connections). In my experience network errors make up a lot of the actual failures in practice, so those are important to cover. So this implementation includes a complete (but simple) retry and backoff algorithm instead of relying on the builtins. 🤷You can configure retries through the
retries
(how many) andretryBackoff
(how long to delay) options:Still not sure about: This changes the constructor for(Update: made this a breaking change.)DatadogReporter
to use an options object instead of positional arguments, since it has so many options now. I’ve deprecated the old positional arguments signature, but I doubt direct usage of that class is common, so it might be fine to just make a breaking change (the only example I could find in real code on GitHub only uses it to support theDD_API_KEY
environment variable, with I made work automatically in #135). Will sleep on it before merging.