Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

Merged
merged 7 commits into from
Oct 1, 2024

Conversation

hagen1778
Copy link
Contributor

@hagen1778 hagen1778 commented Sep 16, 2024

With this change, prometheusremotewrite will log the last recorded conversion error in SerializeBatch

Summary

If the configured input contains bad data, then user might not be aware of the parsing errors as telegraf won't emit any logs or error messages. The error might be helpful for user to understand why some of the series were dropped during processing. In the same time, logging only the last error should prevent logs from pollution if too many conversion errors are taking place.

Checklist

  • No AI generated code was used in this PR

Related issues

Resolves #15782

With this change, prometheusremotewrite will log the last recorded
conversion error in `Serialize` call, if any errors at all.
The error might be helpful for user to understand why some of the
series were dropped during processing.
In the same time, logging only the last error should prevent logs from
pollution if too many conversion errors are taking place.

See influxdata#15782
@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Sep 16, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hagen1778 thanks a lot for your contribution! I have two comments in the code. Furthermore, I'm interested to learn why you do not log all errors but instead keep only the last error? What is the reasoning behind this?

@srebhan srebhan self-assigned this Sep 16, 2024
* rm unnecessary Logger check for nil
* use CaptureLogger instead of custom logger in tests
@hagen1778
Copy link
Contributor Author

hagen1778 commented Sep 17, 2024

Thanks for the quick review!

I'm interested to learn why you do not log all errors but instead keep only the last error? What is the reasoning behind this?

It is expected that metrics batch can contain errors in each series. So depending on the batch size and processing, telegraf could log a lot of errors and confuse the user.
On the other hand, printing one error log per batch still gives a hint to the user that something is wrong and gives enough information to act on. It should work effectively if metrics batch contains only one bad metric and many bad metrics. In both cases, error message can be used in order to get insights about invalid data.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hagen1778 for your update! I understand that you've chosen the last error as this simplifies the code. Could you please add this as a comment to the code so that the next reader doesn't wonder again? ;-)

Furthermore, would it be an option to log all errors with the Trace log level for people that want to see what is failing? Otherwise they will need to reproduce the scenario very often if there are a number of metrics failing...

@srebhan srebhan changed the title feat: log conversion errors in prometheusremotewrite plugin feat(serializers.prometheusremotewrite): Log metric conversion errors Sep 19, 2024
@hagen1778
Copy link
Contributor Author

@srebhan I've updated PR with your recommendations. I'd appreciate you taking another look.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hagen1778! I will adapt the wording in the README a bit but otherwise the PR looks good.

plugins/serializers/prometheusremotewrite/README.md Outdated Show resolved Hide resolved
@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 30, 2024
@srebhan srebhan assigned DStrand1 and unassigned srebhan Sep 30, 2024
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Oct 1, 2024

@DStrand1 DStrand1 merged commit 7df29b0 into influxdata:master Oct 1, 2024
26 of 27 checks passed
@github-actions github-actions bot added this to the v1.33.0 milestone Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/serializer ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prometheusremotewrite: Untyped values are dropped silently
3 participants