-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(anomaly detection): Cron to cleanup disabled alerts #1455
Conversation
aayush-se
commented
Nov 19, 2024
•
edited
Loading
edited
- Currently, cleanup only happens upon detection call leaving stale timestamps for disabled alerts to accumulate
- Created cron which runs weekly to cleanup (delete) disabled alerts based on when they were last queued for detection (28 days) or if last_queued is null then deletes old alerts
- Cleaning alerts as a byproduct cleans the timeseries that are associated with them because of the parent-child relationship
src/seer/anomaly_detection/tasks.py
Outdated
@sentry_sdk.trace | ||
def cleanup_disabled_alerts(): | ||
|
||
logger.info("Cleaning up timeseries data for alerts that have been inactive for over 28 days") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By inactive we mean that anomaly detection was not called for the alert, right? Do we want to make that explicit here?
src/seer/anomaly_detection/tasks.py
Outdated
# Get all alerts that haven't been queued in the last 28 days indicating that they are disabled and are safe to cleanup | ||
alerts = ( | ||
session.query(DbDynamicAlert) | ||
.filter(DbDynamicAlert.last_queued_at < date_threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle the edge case where the last_queued_at is null, as will be the case until the first detection happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for last_queued_at
is the current time so that value should be filled in upon creation of the alert regardless of detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I will include a null check in here because of the old alerts that were created before the data pruning logic was implemented likely resulting in some null values
src/seer/anomaly_detection/tasks.py
Outdated
|
||
deleted_count = 0 | ||
for alert in alerts: | ||
deleted_count += delete_old_timeseries_points(alert, date_threshold.timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we just delete the time series points but the actual entry in the alert table will still remain? Should we delete that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in case they re-enable the alert. For example, if they go from dynamic --> static --> dynamic, then the alert entry should still be in the table right? But if deleting the alert then re-creating it is fine, then I can also delete the alert itself within this cron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will recreate the alert if revived. So we should delete it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly deleted the alerts which in turn also deletes the associated timeseries data from DbDynamicAlertTimeSeries
.one_or_none() | ||
) | ||
assert alert is not None | ||
alert.last_queued_at = datetime.now() - timedelta(days=29) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a unit that includes one alert with last_queued_at as null
src/seer/anomaly_detection/tasks.py
Outdated
def cleanup_disabled_alerts(): | ||
|
||
logger.info( | ||
"Cleaning up timeseries data for alerts that have been inactive (detection has not been run) for over 28 days" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - add date_threshold
in this info message