-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Processing A Nested List as Individual Log Events #5015
Comments
Hey there, we also stumbled upon this issue and found it odd that you can not split simple json arrays into multiple documents. I worked around this by using the following hack:
process these split events by a chained pipeline. I am on mobile so my apologies for the formatting: preprocess_pipeline:
message_process_pipeline We asked for the AWS service team to support splitting json arrays into multiple docs, but this hack seems to work for us now. Edit: |
@JunChatani Nice workaround! Thanks!
I agree that |
I haven’t opened an issue yet, perhaps this one can be used to track it then. |
To clarify, would you want this output?
and
and
|
you got it, this is particularly useful in cases where you have log messages sent from CloudWatch -> Firehose -> S3. The object in S3 is stored like I initially put. You can easily emulate this by subscribing a firehose to a cwl group and sending the results to S3. By implementing this enhancement it would actually resolve the limitation that is noted on the CloudWatch log ingestion today (which suggests directly streaming logs to OpenSearch instead of using Firehose). See here. "Currently, Firehose does not support the delivery of CloudWatch Logs to Amazon OpenSearch Service destination because Amazon CloudWatch combines multiple log events into one Firehose record and Amazon OpenSearch Service cannot accept multiple log events in one record." |
Did you want to do this because OpensearchDashboards cannot visualize nested fields? |
Describe the bug
It could potentially possible to do this however I have not been able to find anything in documentation that covers it.
If you have CloudWatch Logs -> Data Firehose -> S3 and want to pull that into DataPrepper it brings in the multi line event.
The structure seems to be like so:
What I was hoping to do was use DataPrepper to read in the log message from S3 (that is like above) and then parse out the "logEvents" and treat each entry as an individual log message to publish to S3 & OpenSearch alike.
S3 being it will allow me to create a neat structure of a prefix with accountid/log-group/YYYY/MM/DD/HH
However I am not sure that it is possible to extract logEvents dictionary that contains a list of arrays and treat them as separate events.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I was expecting a feature within DataPrepper to support something like so:
Environment (please complete the following information):
Additional context
AWS Managed OpenSearch & AWS Managed OSIS is being used. I setup a local container deployment to expedite testing and still see the same issue.
The text was updated successfully, but these errors were encountered: