Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(anomaly detection): Cron to cleanup disabled alerts #1455

Merged
merged 5 commits into from
Nov 21, 2024

Conversation

aayush-se
Copy link
Member

@aayush-se aayush-se commented Nov 19, 2024

  • Currently, cleanup only happens upon detection call leaving stale timestamps for disabled alerts to accumulate
  • Created cron which runs weekly to cleanup (delete) disabled alerts based on when they were last queued for detection (28 days) or if last_queued is null then deletes old alerts
  • Cleaning alerts as a byproduct cleans the timeseries that are associated with them because of the parent-child relationship

@aayush-se aayush-se requested a review from a team as a code owner November 19, 2024 18:43
@sentry_sdk.trace
def cleanup_disabled_alerts():

logger.info("Cleaning up timeseries data for alerts that have been inactive for over 28 days")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By inactive we mean that anomaly detection was not called for the alert, right? Do we want to make that explicit here?

# Get all alerts that haven't been queued in the last 28 days indicating that they are disabled and are safe to cleanup
alerts = (
session.query(DbDynamicAlert)
.filter(DbDynamicAlert.last_queued_at < date_threshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the edge case where the last_queued_at is null, as will be the case until the first detection happens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for last_queued_at is the current time so that value should be filled in upon creation of the alert regardless of detection

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I will include a null check in here because of the old alerts that were created before the data pruning logic was implemented likely resulting in some null values


deleted_count = 0
for alert in alerts:
deleted_count += delete_old_timeseries_points(alert, date_threshold.timestamp())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we just delete the time series points but the actual entry in the alert table will still remain? Should we delete that as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in case they re-enable the alert. For example, if they go from dynamic --> static --> dynamic, then the alert entry should still be in the table right? But if deleting the alert then re-creating it is fine, then I can also delete the alert itself within this cron

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will recreate the alert if revived. So we should delete it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly deleted the alerts which in turn also deletes the associated timeseries data from DbDynamicAlertTimeSeries

.one_or_none()
)
assert alert is not None
alert.last_queued_at = datetime.now() - timedelta(days=29)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a unit that includes one alert with last_queued_at as null

@aayush-se aayush-se changed the title feat(anomaly detection): Cron to cleanup timeseries from disabled alerts feat(anomaly detection): Cron to cleanup disabled alerts Nov 21, 2024
def cleanup_disabled_alerts():

logger.info(
"Cleaning up timeseries data for alerts that have been inactive (detection has not been run) for over 28 days"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - add date_threshold in this info message

@aayush-se aayush-se merged commit f0f7c7f into main Nov 21, 2024
11 checks passed
@aayush-se aayush-se deleted the anomaly-detection/cron-disabled-alerts branch November 21, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants