Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 hour timeout on AWS::APS::RuleGroupsNamespace Creation #10

Open
pplu opened this issue Oct 18, 2023 · 0 comments
Open

2 hour timeout on AWS::APS::RuleGroupsNamespace Creation #10

pplu opened this issue Oct 18, 2023 · 0 comments

Comments

@pplu
Copy link

pplu commented Oct 18, 2023

Hello,

If you deploy this template to CloudFormation

AWSTemplateFormatVersion: "2010-09-09"
Description: 'Demonstrate APS RuleGroupsNamespace bug'
Resources:
  APSDatabase:
    Type: 'AWS::APS::Workspace'
  AlertRules:
    Type: AWS::APS::RuleGroupsNamespace
    Properties:
      Workspace: !GetAtt APSDatabase.Arn
      Name: !Sub '${AWS::StackName}-rules'
      Data: |
        groups:
        - name: rules
          rules:
          - alert: xxx
            expr: probe_success{job='xxx'} == 0
            for: 10m
            labels:
              severity: page
            annotations:
              summary: summary
              description: "\nSomething is wrong with XXX\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

The stack creation will timeout after 2 hours.

This seems to be due to the fact that an AWS::APS::RuleGroupsNamespace creation error is not well handled. In APS, we can see that the RuleGroupNamespace was created, but with an error:

aws amp --region eu-west-1 describe-rule-groups-namespace --workspace-id ws-XXXXXXXXXXXXXXXXXX --name hangtest-rules
{
    "ruleGroupsNamespace": {
        "arn": "arn:aws:aps:eu-west-1:XXXXX:rulegroupsnamespace/ws-XXXXXXXXXXXXXXXXXXXXXXXX/hangtest-rules",
        "createdAt": "2023-10-18T16:05:00.029000+02:00",
        "modifiedAt": "2023-10-18T16:08:22.238000+02:00",
        "name": "hangtest-rules",
        "status": {
            "statusCode": "CREATION_FAILED",
            "statusReason": "Annotations not equal: inputted annotations: {description=\"\\nSomething is wrong with XXX\\n  VALUE = {{ $value }}\\n  LABELS = {{ $labels }}\", summary=\"summary\"}, loaded annotations: {description=\"Something is wrong with XXX\\n  VALUE = {{ $value }}\\n  LABELS = {{ $labels }}\", summary=\"summary\"}."
        },
        "tags": {
            "aws:cloudformation:stack-id": "arn:aws:cloudformation:eu-west-1:XXXXXXXXXX:stack/hangtest/XXXXXXXXXXXXXX",
            "aws:cloudformation:stack-name": "hangtest",
            "aws:cloudformation:logical-id": "AlertRules"
        }
    }
}

So it seems that this resource provider is not informing CloudFormation correctly if the resource has been correctly created or not, making the resource creation time out. IMHO, CloudFormation should receive a resource failure message as soon as CREATION_FAILED is observed in the status of the RuleGroupNamespace.

The current behaviour makes it so much harder to diagnose an ill formed RuleGroupNamespace (the only thing that is wrong with it is the leading \\n in the description of the rule, which if deleted will render a valid definition), leading to frustratingly long feedback cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant