Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application Insights feature block hanging and Failure Anomalies Still Auto-Generated #18026

Open
1 task done
mindlessroman opened this issue Aug 17, 2022 · 23 comments · Fixed by observeinc/terraform-azure-collection#44
Assignees
Labels
bug service/application-insights upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR v/3.x

Comments

@mindlessroman
Copy link

mindlessroman commented Aug 17, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.2.7

AzureRM Provider Version

3.11.0

Affected Resource(s)/Data Source(s)

provider azurerm

Terraform Configuration Files

# main.tf
terraform {
  backend "azurerm" {
  }
}

provider "azurerm" {
  features {
    application_insights {
      disable_generated_rule = true
    }
  }
}

# appinsights.tf
resource "azurerm_application_insights" "appinsights" {
  name                = "${local.name}-ai"
  location            = azurerm_resource_group.main_resource_group.location
  resource_group_name = azurerm_resource_group.main_resource_group.name
  application_type    = "web"
}

Debug Output/Panic Output

(see below)

Expected Behaviour

I would expect if the disable_generated_rule was set to true, then the Smart Detector Rule that's auto-created would not be generated and/or the autocreated failure anomalies smart detector alert rule would also be turned off. The creation of an app insights resource would take about 30 seconds max. Ability to destroy a resource group not impeded.

Actual Behaviour

In the terraform apply step of our pipeline, the App insights resource will seemingly hit a 10 minute timeout. The resource will have already been created and visible in the Azure portal, but will be still creating according to the pipeline, which feels unnecessary. Waiting for this step to complete when it has completed... but terraform doesn't get that message?

azurerm_application_insights.appinsights: Still creating... [10m0s elapsed]
azurerm_application_insights.appinsights: Still creating... [10m10s elapsed]
azurerm_application_insights.appinsights: Still creating... [10m20s elapsed]
azurerm_application_insights.appinsights: Still creating... [10m30s elapsed]
azurerm_application_insights.appinsights: Creation complete after 10m39s

The rule for Failure Anomalies - {{name of App insights resource}} still is created (as a hidden resource)
image

Which then causes our terraform destroy step to fail:

... # other resources destroyed, uneventfully
azurerm_application_insights.appinsights: Destroying... [id=...]
azurerm_application_insights.appinsights: Destruction complete after 2s
... # other resources destroyed, uneventfully
azurerm_resource_group.main_resource_group: Destroying... [id=...]
azurerm_resource_group.main_resource_group: Still destroying... [id=..., 10s elapsed]
azurerm_resource_group.main_resource_group: Still destroying... [id=..., 9m50s elapsed]
... # a different unrelated warning
│ Error: deleting Resource Group "...": the Resource Group still contains Resources.
│ 
│ Terraform is configured to check for Resources within the Resource Group when deleting the Resource Group - and
│ raise an error if nested Resources still exist to avoid unintentionally deleting these Resources.
│ 
│ Terraform has detected that the following Resources still exist within the Resource Group:
│ 
│ * `/subscriptions/.../resourceGroups/.../providers/microsoft.alertsmanagement/smartDetectorAlertRules/Failure Anomalies - {{app insights resource name}}`
│ 
│ This feature is intended to avoid the unintentional destruction of nested Resources provisioned through some
│ other means (for example, an ARM Template Deployment) - as such you must either remove these Resources, or
│ disable this behaviour using the feature flag `prevent_deletion_if_contains_resources` within the `features`
│ block when configuring the Provider, for example:
│ 
│ provider "azurerm" {
│   features {
│     resource_group {
│       prevent_deletion_if_contains_resources = false
│     }
│   }
│ }
│ 
│ When that feature flag is set, Terraform will skip checking for any Resources within the Resource Group and
│ delete this using the Azure API directly (which will clear up any nested resources).
##[error]Error: The process '/opt/hostedtoolcache/terraform/1.2.7/x64/terraform' failed with exit code 1

My theory is that in the time that the app insights sat waiting (10 minutes) it was enough time for the auto-generated, hidden alert to come online.

First issue: Setting that feature flag makes the build time take (up to) 10 minutes as it waits... even if the resource is in fact finished being created.

If we explicitly call out a smart detection rule to disable and remove the feature block:

# appinsights.tf
resource "azurerm_application_insights_smart_detection_rule" "smart_detection_rule" {
  name                    = "Slow server response time"
  application_insights_id = azurerm_application_insights.appinsights.id
  enabled                 = false
}

# main.tf
terraform {
  backend "azurerm" {
  }
}

provider "azurerm" {
  features {
  }
}

Then in the terraform apply stage:

azurerm_application_insights.appinsights: Creating...
azurerm_application_insights.appinsights: Creation complete after 2s [id=...]
azurerm_application_insights_smart_detection_rule.smart_detection_rule: Creating...
azurerm_application_insights_smart_detection_rule.smart_detection_rule: Creation complete after 1s [id=...]

App insights does not hang, and usually we can delete the resource group before the Failure Anomalies gets generated.

** Second Issue:** Failure Anomalies are "Smart Detection Alert Rules" and not "Smart Detection Rules" are seemingly not under the purview of "disable_generated_rule" flag - ... see the note at this documentation section

This Azure Resource Manager template is unique to the Failure Anomalies alert rule and is different from the other classic Smart Detection rules described in this article. If you want to manage Failure Anomalies manually this is done in Azure Monitor Alerts whereas all other Smart Detection rules are managed in the Smart Detection pane of the UI.

The Request / The Ask

  1. Fix the hanging when declaring that feature flag
  2. Once that is fixed, include the Failure Anomalies as either:
    • an included rule that's turned off when that disable_generated_rule flag is true
    • OR, have more explicit ways in the Azure provider to disable Smart Detection Alert Rules

This documentation describes creating it explicitly. However it feels counterintuitive to explicitly create the resource in terraform (that we don't even get told is there because it's a hidden resource) just so we can have the control to delete it. We never define this hidden resource to be included in our builds in the first place, so we don't have the means to explicitly destroy it.

All this may stem from a recent change under the hood for Azure, but if the terraform equivalents could match, that would be great.

Steps to Reproduce

  1. terraform apply
  2. terraform destroy

Important Factoids

No response

References

PR #16170

On Azure's end, I'm trying to figure out whether some functionality changed under the hood recently that caused this to pop up? Or if it moved to be controlled by something else?

@DanLauerman
Copy link

This seems like a major oversight on Microsoft's part for Azure. Even if an Application Insights resource is deleted in the Portal, the automatically created Smart Detector alerts do not get removed.

Link to feedback provided to Azure for upvoting on the Azure side: https://feedback.azure.com/d365community/idea/cdb1fc68-bb4f-ed11-a81b-000d3adfeb99

@egorshulga
Copy link

egorshulga commented Jan 17, 2023

I wonder, why AzureRM creates the Failure alert rule in the first place? 🤔
I just checked, when we create AppInsights from Azure Portal, no hidden alert is created.
I am sorry, but do I miss something there?

@JohnRAristizabal
Copy link

We are using this to mitigate the issue:

resource_group {
   # This flag is set to mitigate an open bug in Terraform. As soon as this is fixed, we should remove this.
   prevent_deletion_if_contains_resources = false
}

@egorshulga
Copy link

egorshulga commented Jan 23, 2023

upd: this answer appeared to be wrong And it seems we also managed to find a workaround for the issue by declaring the resource explicitly:
resource "azurerm_monitor_smart_detector_alert_rule" "failureAnomalies" {
  count               = var.isProd ? 1 : 0
  name                = "Failure Anomalies"
  resource_group_name = azurerm_resource_group.resourceGroup.name
  detector_type       = "FailureAnomaliesDetector"
  scope_resource_ids  = [azurerm_application_insights.appInsights.id]
  severity            = "Sev3"
  frequency           = "PT1M"
  action_group {
    ids = [one(azurerm_monitor_action_group.actionGroup).id]
  }
}

The funny thing is that you can see this alert is conditional, so it is provisioned for prod only, but somehow this declaration fixes the non-prod environments as well

@pgagliano5
Copy link

Even with "prevent_deletion_if_contains_resources = false" the destroy fails.

@IgorZhavoronok
Copy link

We have the same problem too. When I run destroy pipeline, it's creates "Application Insights Smart Detection" resorce and sometimes "Failure Anomalies", so it is block resource group destruction. That really looks like a bug.

@ameyaagashe
Copy link

I have the same issue. Even if you add "prevent_deletion_if_contains_resources = false" destroy fails. Indeed, a bug. Hoping Microsoft resolves this sooner. This resorts to "Click Ops," whereby one has to manually go and delete the resource and then rerun terraform for it to destroy the resource group.

@catriona-m catriona-m self-assigned this May 31, 2023
@miniemi
Copy link

miniemi commented Jun 5, 2023

I had also the same issue.

"prevent_deletion_if_contains_resources = false" works for me. With this flag set to false, destroy deletes, as it says in the documentation, all the nested resources and the resource group even if some resources are not in the tf state.

_"When that feature flag is set, Terraform will skip checking for any Resources within the Resource Group and delete this using the Azure API directly (which will clear up any nested resources)."

After I added this flag to my tf code, I manually deleted the old state and all the resources in Azure and redeployed everything. After this, destroy runs without any errors.

@GlibMartynenko
Copy link

GlibMartynenko commented Jul 7, 2023

To prevent of creation "Application Insights smart detection rules" and action group I added into my observability package this code:
resource "azurerm_application_insights_smart_detection_rule" "example" { name = "Slow server response time" application_insights_id = azurerm_application_insights.example.id enabled = false }

In that case, I have my custom Action group and Azure do not create its own Action group and rule
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/application_insights_smart_detection_rule

In the documentation, I didn't find any information about this approach but it works for me.

Update:
Sorry, but it looks like it was a bug on Azure :(
Even with this code above "Application Insights Smart Detection" group and "Failure Anomalies"
Smart detector alert rule still created :(

Do we know how to prevent of creation these two resources?

Screenshot 2023-07-07 at 8 42 40 AM

@rcskosir rcskosir added the upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR label Jul 20, 2023
@ezequielan
Copy link

Hello,
any news about this?

Thanks.

@DenisBalan
Copy link

Workaround for this

resource "azurerm_application_insights" "application_insights" {
  name                                  = local.name
  resource_group_name                   = var.resource_group_name
   ...
}


# This resource sits here just to have it imported in the state
resource "azurerm_monitor_action_group" "this" {
  name                = join("-", ["amag", local.name])
  resource_group_name = var.resource_group_name
  short_name          = "amag" # used only for sms
}

resource "azurerm_monitor_smart_detector_alert_rule" "failure_anomalies" {
  name                = "Failure Anomalies - ${local.name}"
  resource_group_name = var.resource_group_name
  detector_type       = "FailureAnomaliesDetector"
  scope_resource_ids  = [azurerm_application_insights.application_insights.id]
  severity            = "Sev0"
  frequency           = "PT1M"
  action_group {
    ids = [azurerm_monitor_action_group.this.id]
  }
}

In this way we have this in state, and when destroying, it gets destroyed automatically before resource group is.
Give it a try, at least for us, its working fine.

@danpetitt
Copy link

@DenisBalan This does not work for me; the apply fails because the rules already exist. There is nothing that can be done except to remove the protections, which are good to have in-place. that stops resources being deleted when the resource group is deleted

@stas-sultanov
Copy link

It looks like there is some kind of policy that automatically creates Failure-Anomalies-Alert-Rule for new created Application Insights instance.

I face this issue by creating Application Insights with Bicep/Arm.

@danpetitt
Copy link

@stas-sultanov I tried using the portal and the azure cli and it doesnt auto-create these alerts; so thats a bit weird

@stas-sultanov
Copy link

@danpetitt there is some kind of glitch in Azure.
I still have issue with auto creation of failure anomalies detector.
I have raised a question on MS - no luck...
Are we the only two who faces this issue?

@sboulema
Copy link

Nope! I also have this issue, lurking around going for a fix...

@stas-sultanov
Copy link

unfortunately, I do not have support plan from MS to rise an Issue via Azure portal..

I just wonder how low qualified people in Microsoft are who implemented this automatic provision of Failure Anomalies Detector..

@danpetitt
Copy link

@stas-sultanov I have a support plan, I will create some obvious steps and log a ticket and see what they say ... probably not a lot, but we can hope.

I can understand the first-experience that its useful to have this happen by default, but we should at least be able to opt-out especially for those using IaaC solutions and not the portal.

I will report back if they say anything

@stas-sultanov
Copy link

@danpetitt , thank you very much!
I guess you may include activity log from monitor that clearly shows that system is doing this on it's own...
activitylog

rliberoff added a commit to rliberoff/aihub that referenced this issue Jun 12, 2024
…ate an open bug in Terraform. or instance, the Resource Group is not deleted when a `Failure Anomalies` resource is present. Reference: hashicorp/terraform-provider-azurerm#18026
cmendible pushed a commit to Azure/aihub that referenced this issue Jun 12, 2024
… with Terraform. (#63)

* Update APIM type to use api version `2023-03-01-preview` which does not have the issue when deleting the APIM.
* Added dependency (`depends_on`) with  for `azurerm_api_management_named_value.tenant_id` for the `azurerm_api_management_api_policy.policy` which is required when deleting the APIM due to an indirect dependency with the Tenant ID value.
* Add `prevent_deletion_if_contains_resources` flag as `false` to mitigate an open bug in Terraform. or instance, the Resource Group is not deleted when a `Failure Anomalies` resource is present. Reference: hashicorp/terraform-provider-azurerm#18026
@mdsharpe
Copy link

Still an issue for me

@agullotti
Copy link

agullotti commented Sep 25, 2024

Still an issue for me

Yes, same. Why would this be marked solved? Multiple people here have stated that the proposed fix in that merge does not work?

@stas-sultanov
Copy link

The problem is that Microsoft states in the documentation that this behavior is by design.
Which is 100500% extremely stupid as it breaks the whole idea of IaC via declarative programming.

@Patrik-Berglund
Copy link

Patrik-Berglund commented Oct 8, 2024

@danpetitt how did it go with the support ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug service/application-insights upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR v/3.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.