The personalize_monitor.py Lambda is called every 5 minutes by a CloudWatch scheduled event rule to generate the CloudWatch metrics needed to populate the Personalize Monitor dashboard line graph widgets and to trigger the CloudWatch alarms for low recommender/campaign utilization and idle recommender/campaign detection (if configured). Also, if the AutoDeleteOrStopIdleResources
deployment parameter is Yes
AND a monitored campaign has been idle more than IdleThresholdHours
hours, this function will publish a DeletePersonalizeCampaign
event to EventBridge that is handled by the personalize_delete_campaign function. An idle campaign is one that has not had any GetRecommendations
or GetPersonalizedRanking
calls in the last IdleThresholdHours
hours. Finally, this function will adjust a campaign's minProvisionedTPS
(down only) if the AutoAdjustMinTPS
deployment parameter is Yes
.
The function first determines what Personalize campaigns should be monitored based on the CloudFormation template parameters you specify when you install the application.
The following custom CloudWatch metrics are generated by this function on 5 minute intervals. You can find these metrics in the AWS console under CloudWatch and then Metrics or you can query them using the CloudWatch API.
Namespace | MetricName | Dimensions | Unit | Description |
---|---|---|---|---|
PersonalizeMonitor | monitoredResourceCount | Count | Number of recommenders and campaigns currently being monitored at interval | |
PersonalizeMonitor | minRecommendationRequestsPerSecond | RecommenderArn | Count/Second | minRecommendationRequestsPerSecond value for the recommender at interval |
PersonalizeMonitor | averageRPS | RecommenderArn | Count/Second | Average RPS for the recommender at interval |
PersonalizeMonitor | recommenderUtilization | RecommenderArn | Percent | Utilization percentage of averageRPS vs minRecommendationRequestsPerSecond at interval |
PersonalizeMonitor | minProvisionedTPS | CampaignArn | Count/Second | minProvisionedTPS value for the campaign at interval |
PersonalizeMonitor | averageTPS | CampaignArn | Count/Second | Average TPS for the campaign at interval |
PersonalizeMonitor | campaignUtilization | CampaignArn | Percent | Utilization percentage of averageTPS vs minProvisionedTPS at interval |
The averageRPS
and averageTPS
metric value for each monitored recommender and campaign is calculated by first determining the number of requests made to the recommender or campaign during the 5 minute interval and dividing by 300 (the number of seconds in 5 minutes). The number of requests is pulled from the GetRecommendations
or GetPersonalizedRanking
metric (depending on the underlying recipe) for the recommender/campaign from the AWS/Personalize
namespace. The request count metric is automatically updated by Personalize itself.
You can optionally have CloudWatch alarms dynamically created for monitored recommenders/campaigns for low utilization and idle recommenders/campaigns.
If you set the AutoCreateUtilizationAlarms
CloudFormation template parameter to Yes
when you installed this application, this function will automatically create a CloudWatch alarm for every recommender and campaign that it monitors. The alarm will trigger when the recommenderUtilization
or campaignUtilization
custom metric described above drops below the UtilizationThresholdAlarmLowerBound
installation parameter for 9 out of 12 evaluation periods. Since the intervals are 5 minutes, that means that 9 of the 12 five minute evaluations over a 60 minute span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. The alarm will be created in the region where the recommender/campaign was created. An SNS topic created by this application will be used as the alarm and ok actions and the NotificationEndpoint
(email address) deployment parameter will be setup as a subscriber to the topic. Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.
The alarm will have its actions disabled when the minRecommendationRequestsPerSecond
or minProvisionedTPS
is 1 and enabled with minRecommendationRequestsPerSecond
or minProvisionedTPS
is > 1 so that notifications are only sent when utilization can be impacted by adjusting minRecommendationRequestsPerSecond
/minProvisionedTPS
.
If you set the AutoCreateIdleAlarms
CloudFormation template parameter to Yes
when you installed this application, this function will automatically create a CloudWatch alarm for every monitored recommender/campaign that is idle for at least IdleThresholdHours
hours. The actions for the alarm will be enabled only after the recommender/campaign has existed for IdleThresholdHours
as well. The GetRecommendations
or GetPersonalizedRanking
(depending on the resource's recipe) will be used to assess the resource's idle state. The alarm will be created in the region where the recommender/campaign was created. An SNS topic created by this application will be used as the alarm and ok actions and the NotificationEndpoint
(email address) deployment parameter will be setup as a subscriber to the topic. Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.
Automatically adjusting minRecommendationRequestsPerSecond (recommenders) and minProvisionedTPS (campaigns) (optional)
If the AutoAdjustMinTPS
deployment parameter is Yes
, this function will check the actual hourly RPS/TPS over the last 14 days against the currently configured minRecommendationRequestsPerSecond
/minProvisionedTPS
and look for opportunities to reduce the minRecommendationRequestsPerSecond
/minProvisionedTPS
to optimize utilization and reduce costs. It does this by checking the recommender's or campaign's request volume for the previous 14 days on hourly intervals and finding the hour with the lowest average RPS/TPS (low watermark). If the low watermark average is less than minRecommendationRequestsPerSecond
/minProvisionedTPS
AND the recommender/campaign is more than 1 day old, it will drop the minRecommendationRequestsPerSecond
/minProvisionedTPS
by 25%. This process will be repeated each hour until either the minRecommendationRequestsPerSecond
/minProvisionedTPS
meets the low watermark RPS/TPS or the minRecommendationRequestsPerSecond
/minProvisionedTPS
reaches 1 (the lowest allowed value). This function will NOT increase the minRecommendationRequestsPerSecond
/minProvisionedTPS
. Instead it will rely on Personalize to auto-scale recommenders/campaigns up and back down to minRecommendationRequestsPerSecond
/minProvisionedTPS
to meet demand.
Since it can take several minutes for a recommender/campaign to redeploy after updating its
minRecommendationRequestsPerSecond
/minProvisionedTPS
, you will receive the notification when the redeploy starts. The recommender/campaign will continue to respond toGetRecommendations
/GetPersonalizedRanking
API requests while it is redeploying. There will be no interruption of service.
See the personalize_update_tps function for details on the update function.
If the AutoDeleteOrStopIdleResources
deployment parameter is Yes
, this function will perform additional checks once per hour for each monitored recommender/campaign to see if it has been idle for more than IdleThresholdHours
hours. The purpose of this feature is to prevent abandoned recommenders/campaigns from continuing to incur inference costs when they are no longer being used. Recommender/campaign checks are distributed across each hour in 10 minute blocks in an attempt to spread out the API calls needed to check and update recommenders/campaigns.
To avoid too aggressively stopping recommenders or deleting campaigns, new recommenders/campaigns that are not more than IdleThresholdHours
hours old are exempt from being stopped/deleted. Similarly, if a recommender/campaign has been updated within IdleThresholdHours
, it will also be exempt from being automatically stopped/deleted. The idea is that new or actively updated recommenders/campaigns are likely not safe to delete.