Skip to content

Latest commit

 

History

History

personalize_monitor_function

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Amazon Personalize Monitor - Core Monitor Function

The personalize_monitor.py Lambda is called every 5 minutes by a CloudWatch scheduled event rule to generate the CloudWatch metrics needed to populate the Personalize Monitor dashboard line graph widgets and to trigger the CloudWatch alarms for low recommender/campaign utilization and idle recommender/campaign detection (if configured). Also, if the AutoDeleteOrStopIdleResources deployment parameter is Yes AND a monitored campaign has been idle more than IdleThresholdHours hours, this function will publish a DeletePersonalizeCampaign event to EventBridge that is handled by the personalize_delete_campaign function. An idle campaign is one that has not had any GetRecommendations or GetPersonalizedRanking calls in the last IdleThresholdHours hours. Finally, this function will adjust a campaign's minProvisionedTPS (down only) if the AutoAdjustMinTPS deployment parameter is Yes.

How it works

The function first determines what Personalize campaigns should be monitored based on the CloudFormation template parameters you specify when you install the application.

CloudWatch Metrics

The following custom CloudWatch metrics are generated by this function on 5 minute intervals. You can find these metrics in the AWS console under CloudWatch and then Metrics or you can query them using the CloudWatch API.

Namespace MetricName Dimensions Unit Description
PersonalizeMonitor monitoredResourceCount Count Number of recommenders and campaigns currently being monitored at interval
PersonalizeMonitor minRecommendationRequestsPerSecond RecommenderArn Count/Second minRecommendationRequestsPerSecond value for the recommender at interval
PersonalizeMonitor averageRPS RecommenderArn Count/Second Average RPS for the recommender at interval
PersonalizeMonitor recommenderUtilization RecommenderArn Percent Utilization percentage of averageRPS vs minRecommendationRequestsPerSecond at interval
PersonalizeMonitor minProvisionedTPS CampaignArn Count/Second minProvisionedTPS value for the campaign at interval
PersonalizeMonitor averageTPS CampaignArn Count/Second Average TPS for the campaign at interval
PersonalizeMonitor campaignUtilization CampaignArn Percent Utilization percentage of averageTPS vs minProvisionedTPS at interval

How is averageRPS/averageTPS calculated?

The averageRPS and averageTPS metric value for each monitored recommender and campaign is calculated by first determining the number of requests made to the recommender or campaign during the 5 minute interval and dividing by 300 (the number of seconds in 5 minutes). The number of requests is pulled from the GetRecommendations or GetPersonalizedRanking metric (depending on the underlying recipe) for the recommender/campaign from the AWS/Personalize namespace. The request count metric is automatically updated by Personalize itself.

CloudWatch Alarms (optional)

You can optionally have CloudWatch alarms dynamically created for monitored recommenders/campaigns for low utilization and idle recommenders/campaigns.

Low Recommender/Campaign Utilization Alarm

If you set the AutoCreateUtilizationAlarms CloudFormation template parameter to Yes when you installed this application, this function will automatically create a CloudWatch alarm for every recommender and campaign that it monitors. The alarm will trigger when the recommenderUtilization or campaignUtilization custom metric described above drops below the UtilizationThresholdAlarmLowerBound installation parameter for 9 out of 12 evaluation periods. Since the intervals are 5 minutes, that means that 9 of the 12 five minute evaluations over a 60 minute span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. The alarm will be created in the region where the recommender/campaign was created. An SNS topic created by this application will be used as the alarm and ok actions and the NotificationEndpoint (email address) deployment parameter will be setup as a subscriber to the topic. Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.

The alarm will have its actions disabled when the minRecommendationRequestsPerSecond or minProvisionedTPS is 1 and enabled with minRecommendationRequestsPerSecond or minProvisionedTPS is > 1 so that notifications are only sent when utilization can be impacted by adjusting minRecommendationRequestsPerSecond/minProvisionedTPS.

Idle Recommender/Campaign Alarm

If you set the AutoCreateIdleAlarms CloudFormation template parameter to Yes when you installed this application, this function will automatically create a CloudWatch alarm for every monitored recommender/campaign that is idle for at least IdleThresholdHours hours. The actions for the alarm will be enabled only after the recommender/campaign has existed for IdleThresholdHours as well. The GetRecommendations or GetPersonalizedRanking (depending on the resource's recipe) will be used to assess the resource's idle state. The alarm will be created in the region where the recommender/campaign was created. An SNS topic created by this application will be used as the alarm and ok actions and the NotificationEndpoint (email address) deployment parameter will be setup as a subscriber to the topic. Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.

Automatically adjusting minRecommendationRequestsPerSecond (recommenders) and minProvisionedTPS (campaigns) (optional)

If the AutoAdjustMinTPS deployment parameter is Yes, this function will check the actual hourly RPS/TPS over the last 14 days against the currently configured minRecommendationRequestsPerSecond/minProvisionedTPS and look for opportunities to reduce the minRecommendationRequestsPerSecond/minProvisionedTPS to optimize utilization and reduce costs. It does this by checking the recommender's or campaign's request volume for the previous 14 days on hourly intervals and finding the hour with the lowest average RPS/TPS (low watermark). If the low watermark average is less than minRecommendationRequestsPerSecond/minProvisionedTPS AND the recommender/campaign is more than 1 day old, it will drop the minRecommendationRequestsPerSecond/minProvisionedTPS by 25%. This process will be repeated each hour until either the minRecommendationRequestsPerSecond/minProvisionedTPS meets the low watermark RPS/TPS or the minRecommendationRequestsPerSecond/minProvisionedTPS reaches 1 (the lowest allowed value). This function will NOT increase the minRecommendationRequestsPerSecond/minProvisionedTPS. Instead it will rely on Personalize to auto-scale recommenders/campaigns up and back down to minRecommendationRequestsPerSecond/minProvisionedTPS to meet demand.

Since it can take several minutes for a recommender/campaign to redeploy after updating its minRecommendationRequestsPerSecond/minProvisionedTPS, you will receive the notification when the redeploy starts. The recommender/campaign will continue to respond to GetRecommendations/GetPersonalizedRanking API requests while it is redeploying. There will be no interruption of service.

See the personalize_update_tps function for details on the update function.

Automatically stopping recommenders and deleting idle campaigns (optional)

If the AutoDeleteOrStopIdleResources deployment parameter is Yes, this function will perform additional checks once per hour for each monitored recommender/campaign to see if it has been idle for more than IdleThresholdHours hours. The purpose of this feature is to prevent abandoned recommenders/campaigns from continuing to incur inference costs when they are no longer being used. Recommender/campaign checks are distributed across each hour in 10 minute blocks in an attempt to spread out the API calls needed to check and update recommenders/campaigns.

To avoid too aggressively stopping recommenders or deleting campaigns, new recommenders/campaigns that are not more than IdleThresholdHours hours old are exempt from being stopped/deleted. Similarly, if a recommender/campaign has been updated within IdleThresholdHours, it will also be exempt from being automatically stopped/deleted. The idea is that new or actively updated recommenders/campaigns are likely not safe to delete.