optimize execution of workflow consisting of bucket-level followed by doc-level monitors #1729

sbcd90 · 2024-11-13T23:10:24Z

Description

The PR proposes following changes to optimize execution of workflow consisting of bucket-level followed by match-all doc-level monitors.

Based on a flag monitor.ignoreFindingsAndAlerts the doc-level monitor is able to ignore storage of findings and alerts and triggering subsequent publishFinding calls. https://github.com/opensearch-project/alerting/pull/1729/files#diff-64dadab7578092d0871a6d87833637fcd3c3e56dae448222ec57476921c9707eR300
Based on a flag monitor.ignoreFindingsAndAlerts the doc-level monitor is able to just match the trigger condition and generate a single alert and subsequently trigger notification. https://github.com/opensearch-project/alerting/pull/1729/files#diff-64dadab7578092d0871a6d87833637fcd3c3e56dae448222ec57476921c9707eR366
This code considers all documents even if indexExecutionContext.docIds is empty.

if (!docIds.isNullOrEmpty()) {
            boolQueryBuilder.filter(QueryBuilders.termsQuery("_id", docIds))

https://github.com/opensearch-project/alerting/pull/1729/files#diff-64dadab7578092d0871a6d87833637fcd3c3e56dae448222ec57476921c9707eR948
This pr addresses this issue for workflows where bucket-level monitor sends an empty indexExecutionContext.docIds list.

Doc-Level monitors move the index sequence numbers after docs are processed. Bucket level monitor however sends a random 10 documents as part of indexExecutionContext.docIds. This results in inconsistent alert generation and triggering of notifications. https://github.com/opensearch-project/alerting/pull/1729/files#diff-756df2dfd50dd9cc484313c8d6db1a052ab18e7fc962829ad56e982c0e6a5c56R483
This pr addresses this issue by sorting the sequence numbers in descending order.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… doc-level monitors Signed-off-by: Subhobrata Dey <[email protected]>

eirsep · 2024-11-18T15:25:19Z

plz update description with what the optimization is and what is the change in the PR?

eirsep · 2024-11-18T15:29:34Z

...ng/src/main/kotlin/org/opensearch/alerting/transport/TransportDocLevelMonitorFanOutAction.kt

@@ -349,6 +363,50 @@ class TransportDocLevelMonitorFanOutAction
        }
    }

+    private suspend fun runForEachDocTriggerIgnoringFindingsAndAlerts(


plz add code comments

Signed-off-by: Subhobrata Dey <[email protected]>

eirsep · 2024-11-19T06:21:08Z

alerting/src/main/kotlin/org/opensearch/alerting/BucketLevelMonitorRunner.kt

@@ -479,7 +480,7 @@ object BucketLevelMonitorRunner : MonitorRunner() {
                            val queryBuilder = if (input.query.query() == null) BoolQueryBuilder()
                            else QueryBuilders.boolQuery().must(source.query())
                            queryBuilder.filter(QueryBuilders.termsQuery(fieldName, bucketValues))
-                            sr.source().query(queryBuilder)
+                            sr.source().query(queryBuilder).sort("_seq_no", SortOrder.DESC)


why are we sorting based on _seq_no?

there is already a range query based on period_end variable

this seems incorrect

we need to sort this because without sort, we get random 10 docs in period of last 15 minutes by default for a workflow running every 1 min. Now, the aggregation may have grouped 1000 docs.
So, 10 out of 1000 docs generated may not be the latest ones. We pass these 10 docs to the delegated doc-level monitor which has already moved its seq_no past these 10 random docs and hence do not generate an alert.

sort by seq_no ensures we always get the latest 10 docs out of the 1000 docs considered for aggregation. Thus, when the doc-level monitor runs next time, it gets latest 10 docs and it goes on to geenrate an alert.

optimize execution of workflow consisting of bucket-level followed by…

bc90d8f

… doc-level monitors Signed-off-by: Subhobrata Dey <[email protected]>

sbcd90 requested review from lezzago, AWSHurneyt, eirsep, getsaurabh02, praveensameneni, bowenlan-amzn, rishabhmaurya, engechas, riysaxen-amzn, jowg-amazon, amsiglan and goyamegh as code owners November 13, 2024 23:10

eirsep reviewed Nov 18, 2024

View reviewed changes

sort matching docs by seq_no

14f9371

Signed-off-by: Subhobrata Dey <[email protected]>

eirsep reviewed Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize execution of workflow consisting of bucket-level followed by doc-level monitors #1729

optimize execution of workflow consisting of bucket-level followed by doc-level monitors #1729

sbcd90 commented Nov 13, 2024 •

edited

Loading

eirsep commented Nov 18, 2024

eirsep Nov 18, 2024

eirsep Nov 19, 2024

sbcd90 Nov 19, 2024

optimize execution of workflow consisting of bucket-level followed by doc-level monitors #1729

Are you sure you want to change the base?

optimize execution of workflow consisting of bucket-level followed by doc-level monitors #1729

Conversation

sbcd90 commented Nov 13, 2024 • edited Loading

Description

Related Issues

Check List

eirsep commented Nov 18, 2024

eirsep Nov 18, 2024

Choose a reason for hiding this comment

eirsep Nov 19, 2024

Choose a reason for hiding this comment

sbcd90 Nov 19, 2024

Choose a reason for hiding this comment

sbcd90 commented Nov 13, 2024 •

edited

Loading