CLI to help automating MCA work
- Clone repository
- Run
npm link
- Watch and compile on change
npm run start
- Compile typescript to javascript with
npm run build
- Run built mca cli with
./dist/bin/mca.js
- src (Contains source files)
- bin (Starting point for cli app)
- cmd (Command line configs using yargs commandDir)
- lib (Code for to commands)
- assets (Assets required for the command line)
- dist (Build folder, same as src but with js files)
- Lint code with
npm run lint
- Fix linter errors with
npm run lint:fix
- Run tests with
npm run test
Unit tests should be in the same location as the code with added spec.ts extension. Larger integration tests should be separated to test folder.
Run npm run release
to make version bump, add tags and update CHANGELOG automatically.
mca-cli and mca-monitoring are used together to setup monitoring for resources in an AWS environment.
The mca monitoring
command searches for resources in an AWS environment. It generates a Node project inside your main project folder and creates a config.yml, which lists all the resources, along with some default alarms.
It is a combination of CloudWatch metrics and CloudWatch alarms.
A metric is a statistic. For example: AWS Lambda has an Invocations metric, which counts the number of times a function is invoked.
An alarm observes a single metric and initiates actions when a specified condition is met. The action could be sending a notification to a SNS topic.
In the root project folder run
npx mca monitoring init -p <aws profile> -r <aws region> -o monitoring
Optional flags
--service
-
a space seperated list of service names to include in the search for resources. By default all resources are included:
- lambda
- dynamodb
- ecs
- apigateway
- cloudfront
- rds
- eks
- loggroup
- appsync
- sqs
--include
: A list of regex patterns of resource names (or ids) to include in the monitoring By default all resources are included. Resources are identified by:
- (lambda) function name
- (dynamodb) table name
- (ecs) cluster name
- (apigateway) api name
- (cloudfront) distribution id or alias
- (rds) db instance identifier
- (eks) cluster name
- (appsync) api name
- (sqs) queue name
--exclude
: Same as above, but resources are excluded.
--help
See all options
In the monitoring folder run npm install
In the monitoring folder run npm run deploy
***it
Read more about CloudWatch concepts
Custom configurations should be listed in config.yml, under custom > default.
- enabled Boolean.
Whether to create an alarm for this metric. - autoResolve Boolean
(optional, default: false), Should the alarm automatically enter “OK” state. -
alarm
-
critical
- comparisonOperator String
(optional, default: GREATER_THAN_OR_EQUAL_TO_THRESHOLD). Comparison to use to check if metric is breaching. (available values) - threshold Number
(required). The value against which the specified statistic is compared. - evaluationPeriods Nubmer
(required). The number of periods over which data is compared to the specified threshold. - evaluateLowSampleCountPercentile Percentile
(optional). Used only for alarms that are based on percentiles. Specifies whether to evaluate the data and potentially change the alarm state if there are too few data points to be statistically significant. - treatMissingData String
(optional, default: NOT_BREACHING). Sets how this alarm is to handle missing data points. (available values)
- comparisonOperator String
-
critical
-
metric
-
period
(optional, default: 5 minutes) The period over which the specified statistic is applied. Can have one of the following sub properties:- milliseconds Number
- seconds Number
- minutes Number
- hours Number
- days Number
- isoString ISO 8601
- statisticString
(required, one of: Minimum, Maximum, Average, Sum, SampleCount, pNN.NN). What function to use for aggregating. - unitString
(optional, default: undefined). Unit used to filter the metric stream. Only useful when datums are being emitted to the same metric stream under different units.
-
period
-
[custom metric name]
-
filter
-
pattern String
(required). Filter pattern syntax.
When using quotes for exact matches (e.g. “[ERROR]"), put single + double quotes (e.g. '"[ERROR]"'), or mca-monitoring will end up with a regex (e.g. [ERROR]).
-
pattern String
-
filter
cli:
version: 1
services: # mca-cli will search for resources from these services
- lambda
- dynamodb
- apigateway
- cloudfront
- loggroup
includes: [] # Regex patterns (resource names) to include in monitoring. See "Optional flags" in "Setting up default monitoring" section above.
excludes: # Exclude these resources patterns from monitoring.
- '*ee*'
- '*rapsiapp*'
- '*dev*'
- '*marketprice*'
- '*warmup*'
- '*error-handler*'
profile: nc-personal-user # The AWS profile used during searching for resources with mca-cli and when deploying with mca-monitoring.
custom:
default:
lambda: # Config type. See "Config types and metrics" bellow.
Errors: # Metric name
enabled: true # Whether to create an alarm for this metric.
autoresolve: false # Should the alarm automatically enter “OK” state.
alarm:
critical:
comparisonOperator: GREATER_THAN_OR_EQUAL_TO_THRESHOLD # Comparison to use to check if metric is breaching.
threshold: 1 # The value against which the specified statistic is compared.
evaluationPeriods: 1 # The number of periods over which data is compared to the specified threshold.
metric:
period: # The period over which the specified statistic is applied.
minutes: 15
statistic: Minimum # What function to use for aggregating.
cloudfront: # Config type
4XXErrorRate: # Metric name
enabled: false # Monitoring for this metric is disabled.
logGroup:
RuntimeErrors: # Custom metric name
enabled: true
alarm:
critical:
threshold: 1
evaluationPeriods: 1
metric:
period:
minutes: 5
unit: Count
statistic: Sum
filter:
pattern: ERROR -400 -401 -403 -404 -Timeout -DeprecationWarning
snsTopic: # This topic is created by default and is used by all alarms.
critical:
name: Topic for mca monitoring alarms
id: avena-alerts-alarm
endpoints:
- >-
https://events.pagerduty.com/integration/58287e69892c4406aa88db8619721142/enqueue
emails: []
lambdas: # Lambdas to be monitored.
myTestLambda: {}
distributions: # CloudFront distributions to be monitored.
E2K3LH1G46OF18: {}
E3ADB61RBHAPW9: {}
E35IJ0HST9PMZQ: {}
logGroups:
/aws/lambda/avenakauppa-fi-analysis-prod-get-analysis: {}
/aws/lambda/avenakauppa-fi-analysis-prod-post-analysis: {}
lambda
- Invocations
- Errors
- DeadLetterErrors
- DestinationDeliveryFailures
- Throttles
- ProvisionedConcurrencyInvocations
- ProvisionedConcurrencySpilloverInvocations
- Duration
- IteratorAge
- ConcurrencyExecutions
- ProvisionedConcurrencyExecutions
- ProvisionedConcurrencyUtilizations
- UnreservedConcurrentExecutions
table
- ConditionalCheckFailedRequests
- ConsumedReadCapacityUnits
- ConsumedWriteCapacityUnits
- MaxProvisionedTableReadCapacityUtilization
- MaxProvisionedTableWriteCapacityUtilization
- OnlineIndexConsumedWriteCapacity
- OnlineIndexPercentageProgress
- OnlineIndexThrottleEvents
- PendingReplicationCount
- ProvisionedReadCapacity
- ProvisionedWriteCapacity
- ReadThrottleEvents
- ReplicationLatency
- ReturnedBytes
- ReturnedItemCount
- ReturnedRecordsCount
- SystemErrors
- TimeToLiveDeletedItemCount
- ThrottledRequests
- TransactionConflict
- WriteThrottleEvents
account
This is part of the AWS/DynamoDB namespace
- UserErrors
clusters
- CPUReservation
- CPUUtilization
- MemoryReservation
- MemoryUtilization
- GPUReservation
apiGateway
- 4XXError
- 5XXError
- CacheHitCount
- CacheMissCount
- Count
- IntegrationLatency
- Latency
cloudfront
- 4XXErrorRate
- 5XXErrorRate
- 401ErrorRate
- 403ErrorRate
- 404ErrorRate
- 502ErrorRate
- 503ErrorRate
- 504ErrorRate
- BytesDownloaded
- BytesUploaded
- CacheHitRate
- OriginLatency
- Requests
- TotalErrorRate
rds
- BinLogDiskUsage
- BurstBalance
- CPUUtilization
- CPUCreditUsage
- CPUCreditBalance
- DatabaseConnections
- DiskQueueDepth
- FailedSQLServerAgentJobsCount
- FreeableMemory
- FreeStorageSpace
- MaximumUsedTransactionIDs
- NetworkReceiveThroughput
- NetworkTransmitThroughput
- OldestReplicationSlotLag
- ReadIOPS
- ReadLatency
- ReadThroughput
- ReplicaLag
- ReplicationSlotDiskUsage
- SwapUsage
- TransactionLogsDiskUsage
- TransactionLogsGeneration
- WriteIOPS
- WriteLatency
- WriteThrougput
eks
- cluster_failed_node_count
- cluster_node_count
- namespace_number_of_running_pods
- node_cpu_limit
- node_cpu_reserved_capacity
- node_cpu_usage_total
- node_cpu_utilization
- node_filesystem_utilization
- node_memory_limit
- node_memory_reserved_capacity
- node_memory_utilization
- node_memory_working_set
- node_network_total_bytes
- node_number_of_running_containers
- node_number_of_running_pods
- pod_cpu_reserved_capacity
- pod_cpu_utilization
- pod_cpu_utilization_over_pod_limit
- pod_memory_reserved_capacity
- pod_memory_utilization
- pod_memory_utilization_over_pod_limit
- pod_number_of_container_restarts
- pod_network_rx_bytes
- pod_network_tx_bytes
- service_number_of_running_pods
appSyncApi
- 4XXError
- 5XXError
- Latency
- ConnectSuccess
- ConnectClientError
- ConnectServerError
- DisconnectSuccess
- DisconnectClientError
- DisconnectServerError
- SubscribeSuccess
- SubscribeClientError
- SubscribeServerError
- UnsubscribeSuccess
- UnsubscribeClientError
- UnsubscribeServerError
- PublishDataMessageSuccess
- PublishDataMessageClientError
- PublishDataMessageServerError
- PublishDataMessageSize
- ActiveConnection
- ActiveSubscription
- ConnectionDuration
sqs
- ApproximateAgeOfOldestMessage
- ApproximateNumberOfMessagesDelayed
- ApproximateNumberOfMessagesNotVisible
- ApproximateNumberOfMessagesVisible
- NumberOfEmptyReceives
- NumberOfMessagesDeleted
- NumberOfMessagesReceived
- NumberOfMessagesSent
- SentMessageSize