diff --git a/README.md b/README.md index 6ede3e9a..8d3aeb00 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ Apart from the above arrangement, the following system modules/pods are part of This section describes the nested topology design implemented by this solution. -![alt text](architecture/nested-topology-hld-envoy.png "Nested Toplogy") +![alt text](architecture/nested-topology-hld.png "Nested Toplogy") At the core of the nested topology design, we have reverse proxies which broker the connections between each hypothetical ISA-95 level (Level 2,3,4 in this instance). These proxies prevent workloads and Arc agents running at lower levels from connecting to the outside world directly, allowing the traffic to be managed or controlled via proxy configuration at each level. Currently, data plane is traversing layers directly between brokers, and we are evaluating an improvement to force this communication to pass through the proxy transparently. Proxying of allowed URI calls from the lower L2 and L3 levels for the AKS host nodes (kubelet, containerd) is implemented using a DNS Server override in each lower Virtual Network. @@ -68,6 +68,10 @@ Workloads exchange messages locally on the same network layer and Kubernetes clu For more information about MQTT broker choice and comparison, please see [MQTT Broker for Data Communication Between Workloads and Between Network Layers](/docs/mqttbroker.md). +### Monitoring and Observability + +Gathering diverse signals from sources such as operating systems, data components, custom workloads, and the Kubernetes platform itself, as well as analyzing these is discussed in a separate document [Observability for Distributed Edge](./docs/observability.md). + ## Solution Deployment ![alt text](architecture/deployment-hld.png "Deployment Strategy") diff --git a/architecture/nested-topology-hld.png b/architecture/nested-topology-hld.png index 8b4ab6b4..123fd080 100644 Binary files a/architecture/nested-topology-hld.png and b/architecture/nested-topology-hld.png differ diff --git a/architecture/observability-hld-slide.png b/architecture/observability-hld-slide.png new file mode 100644 index 00000000..761776b0 Binary files /dev/null and b/architecture/observability-hld-slide.png differ diff --git a/architecture/observability-stacked.png b/architecture/observability-stacked.png new file mode 100644 index 00000000..a85e887b Binary files /dev/null and b/architecture/observability-stacked.png differ diff --git a/deployment/bicep/core-infra-vnet.bicep b/deployment/bicep/core-infra-base.bicep similarity index 88% rename from deployment/bicep/core-infra-vnet.bicep rename to deployment/bicep/core-infra-base.bicep index 685aaf1d..170f5dc5 100644 --- a/deployment/bicep/core-infra-vnet.bicep +++ b/deployment/bicep/core-infra-base.bicep @@ -81,6 +81,9 @@ param aksObjectId string @description('Wether to close down outbound internet access') param closeOutboundInternetAccess bool = false +@description('Provision monitoring') +param provisionMonitoring bool = false + var applicationNameWithoutDashes = replace(applicationName, '-', '') var aksName = take('aks-${applicationNameWithoutDashes}', 20) var resourceGroupName = applicationName @@ -131,6 +134,16 @@ module downstreamvnetpeering 'modules/vnetpeering.bicep' = if (!empty(remoteVnet ] } +module monitoring 'modules/loganalytics.bicep' = if (provisionMonitoring) { + scope: resourceGroup(rg.name) + name: 'monitoringDeployment' + params: { + workspaceAccountName: applicationName + monitorAccountLocation: location + } +} + output aksName string = aksName output aksResourceGroup string = resourceGroupName output subnetId string = vnet.outputs.subnetId +output appInsightsInstrumentationKey string = provisionMonitoring ? monitoring.outputs.instrumentationKey : '' diff --git a/deployment/bicep/modules/azurestorage.bicep b/deployment/bicep/modules/azurestorage.bicep index 60541816..175e2f64 100644 --- a/deployment/bicep/modules/azurestorage.bicep +++ b/deployment/bicep/modules/azurestorage.bicep @@ -4,7 +4,7 @@ // ------------------------------------------------------------ @minLength(3) @maxLength(24) -@description('Azure Stroage Account name which is not already in use.') +@description('Azure Storage Account name which is not already in use.') param storageAccountName string @description('Storage account location') diff --git a/deployment/bicep/modules/loganalytics.bicep b/deployment/bicep/modules/loganalytics.bicep new file mode 100644 index 00000000..12ea129d --- /dev/null +++ b/deployment/bicep/modules/loganalytics.bicep @@ -0,0 +1,50 @@ +// ------------------------------------------------------------ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License (MIT). See License.txt in the repo root for license information. +// ------------------------------------------------------------ +@minLength(3) +@maxLength(24) +@description('Azure Log Analytics and App Insights Account name which is not already in use.') +param workspaceAccountName string + +@description('Azure Log Analytics account location') +@maxLength(20) +param monitorAccountLocation string = resourceGroup().location + +resource logAnalytics 'Microsoft.OperationalInsights/workspaces@2022-10-01' = { + name: workspaceAccountName + location: monitorAccountLocation + properties: { + sku: { + name: 'PerGB2018' + } + retentionInDays: 30 + features: { + enableLogAccessUsingOnlyResourcePermissions: true + } + workspaceCapping: { + dailyQuotaGb: -1 + } + publicNetworkAccessForIngestion: 'Enabled' + publicNetworkAccessForQuery: 'Enabled' + } +} + +resource appInsightsComponent 'microsoft.insights/components@2020-02-02' = { + name: workspaceAccountName + location: monitorAccountLocation + kind: 'web' + properties: { + Application_Type: 'web' + Flow_Type: 'Bluefield' + Request_Source: 'rest' + RetentionInDays: 90 + WorkspaceResourceId: logAnalytics.id + IngestionMode: 'LogAnalytics' + publicNetworkAccessForIngestion: 'Enabled' + publicNetworkAccessForQuery: 'Enabled' + } +} + +output instrumentationKey string = appInsightsComponent.properties.InstrumentationKey + diff --git a/deployment/bicep/modules/vnet.bicep b/deployment/bicep/modules/vnet.bicep index 2f2df2a3..13c45fc4 100644 --- a/deployment/bicep/modules/vnet.bicep +++ b/deployment/bicep/modules/vnet.bicep @@ -31,8 +31,8 @@ param closeOutboundInternetAccess bool = false var subnetName = aksName var subnetNsgName = aksName -var arrayBasicRules = [ allowProxyInboundSecurityRule, allowMqttSslInboundSecurityRule ] -var arrayBaseAndLockRules = [ allowProxyInboundSecurityRule, allowMqttSslInboundSecurityRule, allowK8ApiHTTPSOutbound, allowK8ApiUdpOutbound, allowTagAks9000Outbound, allowTagFrontDoorFirstParty, allowTagMcr, denyOutboundInternetAccessSecurityRule ] +var arrayBasicRules = [ allowProxyInboundSecurityRule, allowMqttSslInboundSecurityRule, allowOtelGrpcInboundSecurityRule ] +var arrayBaseAndLockRules = [ allowProxyInboundSecurityRule, allowMqttSslInboundSecurityRule, allowOtelGrpcInboundSecurityRule, allowK8ApiHTTPSOutbound, allowK8ApiUdpOutbound, allowTagAks9000Outbound, allowTagFrontDoorFirstParty, allowTagMcr, denyOutboundInternetAccessSecurityRule ] // TODO: We need to do this is nested manner e.g. use parent vnet/subnet if this is nested vnet/subnet creation. var allowProxyInboundSecurityRule = { @@ -41,7 +41,7 @@ var allowProxyInboundSecurityRule = { priority: 1010 access: 'Allow' direction: 'Inbound' - destinationPortRange: '443' + destinationPortRanges: ['443', '8084'] protocol: 'Tcp' sourcePortRange: '*' sourceAddressPrefix: 'VirtualNetwork' @@ -49,7 +49,6 @@ var allowProxyInboundSecurityRule = { } } -// TODO: potentially remove this if going through proxy, for now setup for testing MQTT bridging var allowMqttSslInboundSecurityRule = { name: 'AllowMqttSsl' properties: { @@ -64,6 +63,20 @@ var allowMqttSslInboundSecurityRule = { } } +var allowOtelGrpcInboundSecurityRule = { + name: 'AllowOtelGrpc' + properties: { + priority: 1030 + access: 'Allow' + direction: 'Inbound' + destinationPortRange: '4318' + protocol: 'Tcp' + sourcePortRange: '*' + sourceAddressPrefix: 'VirtualNetwork' + destinationAddressPrefix: 'VirtualNetwork' + } +} + var allowK8ApiHTTPSOutbound = { name: 'AllowK8ApiHTTPSOutbound' properties: { diff --git a/deployment/build-and-deploy-images.ps1 b/deployment/build-and-deploy-images.ps1 index eb6a480d..1b095b78 100644 --- a/deployment/build-and-deploy-images.ps1 +++ b/deployment/build-and-deploy-images.ps1 @@ -14,7 +14,11 @@ Param( # leave empty if both workloads are deployed on single cluster L4 [string] [Parameter(mandatory=$false)] - $L2ResourceGroupName + $L2ResourceGroupName, + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true ) if(!$env:RESOURCEGROUPNAME -and !$AppResourceGroupName) @@ -56,6 +60,8 @@ Set-Location -Path $deploymentDir Write-Title("Upgrade/Install Pod/Containers with Helm charts in Cluster L4") $datagatewaymoduleimage = $acrName + ".azurecr.io/datagatewaymodule:" + $deploymentId +$observabilityString = ($SetupObservability -eq $true) ? "true" : "false" +$samplingRate = ($SetupObservability -eq $true) ? "1" : "0" # in development we set to 1, in prod should be 0.0001 or similar, 0 turns off observability # ----- Get Cluster Credentials for L4 layer Write-Title("Get AKS Credentials L4 Layer") @@ -68,6 +74,8 @@ az aks get-credentials ` helm upgrade iot-edge-l4 ./helm/iot-edge-l4 ` --set-string images.datagatewaymodule="$datagatewaymoduleimage" ` + --set-string observability.samplingRate="$samplingRate" ` + --set observability.enabled=$observabilityString ` --namespace $appKubernetesNamespace ` --reuse-values ` --install @@ -96,6 +104,8 @@ helm upgrade iot-edge-l2 ./helm/iot-edge-l2 ` --set-string images.simulatedtemperaturesensormodule="$simtempimage" ` --set-string images.opcplcmodule="$opcplcimage" ` --set-string images.opcpublishermodule="$opcpublisherimage" ` + --set observability.enabled=$observabilityString ` + --set-string observability.samplingRate="$samplingRate" ` --reuse-values ` --namespace $appKubernetesNamespace ` --install diff --git a/deployment/deploy-az-demo-bootstrapper.ps1 b/deployment/deploy-az-demo-bootstrapper.ps1 index 4d71a2a7..0538b85d 100644 --- a/deployment/deploy-az-demo-bootstrapper.ps1 +++ b/deployment/deploy-az-demo-bootstrapper.ps1 @@ -13,7 +13,11 @@ Param( [string] [Parameter(mandatory=$false)] - $Location = 'westeurope' + $Location = 'westeurope', + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true ) mkdir -p modules @@ -38,18 +42,19 @@ Invoke-WebRequest -Uri "$baseLocation/deployment/deploy-app-l4.ps1" -OutFile "de mkdir -p bicep/modules Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/core-infra-aks.bicep" -OutFile "./bicep/core-infra-aks.bicep" -Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/core-infra-vnet.bicep" -OutFile "./bicep/core-infra-vnet.bicep" +Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/core-infra-base.bicep" -OutFile "./bicep/core-infra-base.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/iiot-app.bicep" -OutFile "./bicep/iiot-app.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/acr.bicep" -OutFile "./bicep/modules/acr.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/azurestorage.bicep" -OutFile "./bicep/modules/azurestorage.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/eventhub.bicep" -OutFile "./bicep/modules/eventhub.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/vnet.bicep" -OutFile "./bicep/modules/vnet.bicep" Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/vnetpeering.bicep" -OutFile "./bicep/modules/vnetpeering.bicep" +Invoke-WebRequest -Uri "$baseLocation/deployment/bicep/modules/loganalytics.bicep" -OutFile "./bicep/modules/loganalytics.bicep" # Deploy 3 core infrastructure layers i.e. L4, L3, L2, replicating 3 levels of Purdue network topology. -$l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $true -Location $Location -$l3LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l4LevelCoreInfra -ApplicationName ($ApplicationName + "L3") -VnetAddressPrefix "172.18.0.0/16" -SubnetAddressPrefix "172.18.0.0/18" -SetupArc $true -Location $Location -$l2LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l3LevelCoreInfra -ApplicationName ($ApplicationName + "L2") -VnetAddressPrefix "172.20.0.0/16" -SubnetAddressPrefix "172.20.0.0/18" -SetupArc $true -Location $Location +$l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $true -Location $Location -SetupObservability $SetupObservability +$l3LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l4LevelCoreInfra -ApplicationName ($ApplicationName + "L3") -VnetAddressPrefix "172.18.0.0/16" -SubnetAddressPrefix "172.18.0.0/18" -SetupArc $true -Location $Location -SetupObservability $SetupObservability +$l2LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l3LevelCoreInfra -ApplicationName ($ApplicationName + "L2") -VnetAddressPrefix "172.20.0.0/16" -SubnetAddressPrefix "172.20.0.0/18" -SetupArc $true -Location $Location -SetupObservability $SetupObservability # Deploy core platform layer (Dapr on L4 and L2, Mosquitto broker bridging on L2, L3 and L4). $l4CorePlatform = ./deploy-core-platform.ps1 -AksClusterName $l4LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName -DeployDapr $true -MosquittoParentConfig $null diff --git a/deployment/deploy-az-dev-bootstrapper.ps1 b/deployment/deploy-az-dev-bootstrapper.ps1 index 6dcfed0e..56f31db2 100644 --- a/deployment/deploy-az-dev-bootstrapper.ps1 +++ b/deployment/deploy-az-dev-bootstrapper.ps1 @@ -9,7 +9,15 @@ Param( [string] [Parameter(mandatory=$false)] - $Location = 'westeurope' + $Location = 'westeurope', + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true, + + [Parameter(Mandatory = $false)] + [bool] + $SetupArc = $false ) # Import text utilities module. @@ -20,7 +28,7 @@ Import-Module -Name ./modules/process-utils.psm1 Write-Title("Start Deploying") $startTime = Get-Date $ApplicationName = $ApplicationName.ToLower() - +$samplingRate = ($SetupObservability -eq $true) ? "1" : "0" # in development we set to 1, in prod should be 0.0001 or 0, 0 turns off observability # --- Ensure Location is set to short name $Location = Get-AzShortRegion($Location) @@ -28,48 +36,52 @@ $Location = Get-AzShortRegion($Location) # 1. Deploy core infrastructure (AKS clusters, VNET) -$l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $false -Location $Location -$l3LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l4LevelCoreInfra -ApplicationName ($ApplicationName + "L3") -VnetAddressPrefix "172.18.0.0/16" -SubnetAddressPrefix "172.18.0.0/18" -SetupArc $false -Location $Location -$l2LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l3LevelCoreInfra -ApplicationName ($ApplicationName + "L2") -VnetAddressPrefix "172.20.0.0/16" -SubnetAddressPrefix "172.20.0.0/18" -SetupArc $false -Location $Location +$l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $SetupArc -Location $Location -SetupObservability $SetupObservability +$l3LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l4LevelCoreInfra -ApplicationName ($ApplicationName + "L3") -VnetAddressPrefix "172.18.0.0/16" -SubnetAddressPrefix "172.18.0.0/18" -SetupArc $SetupArc -Location $Location -SetupObservability $SetupObservability +$l2LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ParentConfig $l3LevelCoreInfra -ApplicationName ($ApplicationName + "L2") -VnetAddressPrefix "172.20.0.0/16" -SubnetAddressPrefix "172.20.0.0/18" -SetupArc $SetupArc -Location $Location -SetupObservability $SetupObservability -# # 2. Deploy core platform in each layer (Dapr, Mosquitto and bridging). -$l4CorePlatform = ./deploy-core-platform.ps1 -AksClusterName $l4LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName -DeployDapr $true -MosquittoParentConfig $null -ArcEnabled $false -$l3CorePlatform = ./deploy-core-platform.ps1 -AksClusterName $l3LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l3LevelCoreInfra.AksClusterResourceGroupName -MosquittoParentConfig $l4CorePlatform -ArcEnabled $false -./deploy-core-platform.ps1 -AksClusterName $l2LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l2LevelCoreInfra.AksClusterResourceGroupName -DeployDapr $true -MosquittoParentConfig $l3CorePlatform -ArcEnabled $false +# 2. Deploy core platform in each layer (Dapr, Mosquitto and bridging). +$l4CorePlatform = ./deploy-core-platform.ps1 -AksClusterName $l4LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName -DeployDapr $true -MosquittoParentConfig $null -ArcEnabled $SetupArc +$l3CorePlatform = ./deploy-core-platform.ps1 -AksClusterName $l3LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l3LevelCoreInfra.AksClusterResourceGroupName -MosquittoParentConfig $l4CorePlatform -ArcEnabled $SetupArc +./deploy-core-platform.ps1 -AksClusterName $l2LevelCoreInfra.AksClusterName -AksClusterResourceGroupName $l2LevelCoreInfra.AksClusterResourceGroupName -DeployDapr $true -MosquittoParentConfig $l3CorePlatform -ArcEnabled $SetupArc # 3. Deploy app resources in Azure, build images and deploy helm on level L4 and L2. $l4AppConfig = ./deploy-dev-app-l4.ps1 -ApplicationName $ApplicationName ` -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName ` -AksClusterName $l4LevelCoreInfra.AksClusterName -AksServicePrincipalName ($ApplicationName + "L4") ` - -Location $Location + -Location $Location ` + -SetupObservability $SetupObservability ` + -SamplingRate $samplingRate # Note currently for developer flow we need Azure Container Registry deployed by L4 (via L4AppConfig). ./deploy-dev-app-l2.ps1 -ApplicationName $ApplicationName ` -AksClusterName $l2LevelCoreInfra.AksClusterName ` -AksClusterResourceGroupName $l2LevelCoreInfra.AksClusterResourceGroupName ` -AksServicePrincipalName ($ApplicationName + "L2") ` - -L4AppConfig $l4AppConfig + -L4AppConfig $l4AppConfig ` + -SetupObservability $SetupObservability ` + -SamplingRate $samplingRate # # --- Deploying just a single layer: comment above block and uncomment below: -# $l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $false -Location $Location +# $l4LevelCoreInfra = ./deploy-core-infrastructure.ps1 -ApplicationName ($ApplicationName + "L4") -VnetAddressPrefix "172.16.0.0/16" -SubnetAddressPrefix "172.16.0.0/18" -SetupArc $false -Location $Location $SetupObservability # ./deploy-core-platform.ps1 -AksClusterName $l4LevelCoreInfra.AksClusterName ` # -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName ` -# -DeployDapr $true -MosquittoParentConfig $null -ArcEnabled $false +# -DeployDapr $true -MosquittoParentConfig $null -ArcEnabled $SetupArc # $l4AppConfig = ./deploy-dev-app-l4.ps1 -ApplicationName $ApplicationName ` # -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName ` # -AksClusterName $l4LevelCoreInfra.AksClusterName ` # -AksServicePrincipalName ($ApplicationName + "L4") ` -# -Location $Location +# -Location $Location -SetupObservability $SetupObservability # # when deploying L2 workload on single cluster in L4, passing in parameters pointing to L4 is intentional # ./deploy-dev-app-l2.ps1 -ApplicationName $ApplicationName ` # -AksClusterName $l4LevelCoreInfra.AksClusterName ` # -AksClusterResourceGroupName $l4LevelCoreInfra.AksClusterResourceGroupName ` # -AksServicePrincipalName ($ApplicationName + "L4") ` -# -L4AppConfig $l4AppConfig +# -L4AppConfig $l4AppConfig -SetupObservability $SetupObservability # #---------------- $runningTime = New-TimeSpan -Start $startTime diff --git a/deployment/deploy-core-infrastructure.ps1 b/deployment/deploy-core-infrastructure.ps1 index 073b05a1..89066132 100644 --- a/deployment/deploy-core-infrastructure.ps1 +++ b/deployment/deploy-core-infrastructure.ps1 @@ -25,12 +25,16 @@ Param( [Parameter(Mandatory = $false)] [bool] - $SetupArc = $true + $SetupArc = $true, + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true ) # Uncomment this if you are testing this script without deploy-az-demo-bootstrapper.ps1 -# Import-Module -Name ./modules/text-utils.psm1 -# Import-Module -Name ./modules/process-utils.psm1 +Import-Module -Name ./modules/text-utils.psm1 +Import-Module -Name ./modules/process-utils.psm1 Write-Title("Install Module powershell-yaml if not yet available") if ($null -eq (Get-Module -ListAvailable -Name powershell-yaml)) { @@ -39,7 +43,8 @@ if ($null -eq (Get-Module -ListAvailable -Name powershell-yaml)) { } class Aks { - [PSCustomObject] Prepare ([string]$resourceGroupName, [string]$aksName, [PSCustomObject]$proxyConfig, [bool]$enableArc, [string]$arcLocation) { + [PSCustomObject] Prepare ([string]$resourceGroupName, [string]$aksName, [PSCustomObject]$parentProxyConfig, + [bool]$enableArc, [string]$arcLocation, [bool]$installObservability, [string]$otelAppInsightsKey) { # ----- Get AKS Cluster Credentials Write-Title("Get AKS $aksName in $resourceGroupName Credentials") @@ -51,8 +56,16 @@ class Aks { azure_samples_github = "azure-samples.github.io"; github_com = "github.com"; ghcr_io = "ghcr.io"; - dapr_github_io = "dapr.github.io";} + dapr_github_io = "dapr.github.io"; } # if you want an empty list, set $customDomainsHash = @{} + + if($installObservability){ + $customDomainsHash.Add("gcr_$($customDomainsHash.Count)", "gcr.io") + $customDomainsHash.Add("jetstack_$($customDomainsHash.Count)", "charts.jetstack.io") + $customDomainsHash.Add("otel_$($customDomainsHash.Count)", "open-telemetry.github.io") + $customDomainsHash.Add("k8s_$($customDomainsHash.Count)", "registry.k8s.io") + $customDomainsHash.Add("k8s_$($customDomainsHash.Count)", "cr.fluentbit.io") + } # ---- Download service bus domains from URI for chosen Azure region $serviceBusDomains = Invoke-WebRequest -Uri "https://guestnotificationservice.azure.com/urls/allowlist?api-version=2020-01-01&location=$arcLocation" -Method Get @@ -63,14 +76,16 @@ class Aks { } $customDomainsHelm = $customDomainsHash.GetEnumerator() | ForEach-Object { "customDomains.$($_.Key)=$($_.Value)" } $customDomainsHelm = $customDomainsHelm -Join "," + + $observabilityString = ($installObservability -eq $true) ? "true" : "false" # ----- Install AKS reverse Proxy - helm repo add azdistributededge https://azure-samples.github.io/distributed-az-edge-framework + helm repo add azdistributededge https://azure-samples.github.io/distributed-az-edge-framework --force-update helm repo update - if ($proxyConfig) { - $parentProxyIp = $proxyConfig.ProxyIp - $parentProxyPort = $proxyConfig.ProxyPort + if ($null -ne $parentProxyConfig) { + $parentProxyIp = $parentProxyConfig.ProxyIp + $parentProxyPort = $parentProxyConfig.ProxyPort Write-Title("Install Envoy Reverse Proxy with Parent Ip $parentProxyIp, Port $parentProxyPort") helm install envoy azdistributededge/envoy-reverseproxy ` @@ -78,6 +93,8 @@ class Aks { --set-string domainRegion="$arcLocation" ` --set-string parent.proxyIp="$parentProxyIp" ` --set-string parent.proxyHttpsPort="$parentProxyPort" ` + --set observability.enablePrometheusScrape=$observabilityString ` + --set-string arguments.logLevel="info" ` --set $customDomainsHelm ` --namespace edge-infra ` --create-namespace ` @@ -87,6 +104,8 @@ class Aks { Write-Title("Install Envoy Reverse Proxy without Parent") helm install envoy azdistributededge/envoy-reverseproxy ` --set-string domainRegion="$arcLocation" ` + --set observability.enablePrometheusScrape=$observabilityString ` + --set-string arguments.logLevel="info" ` --set $customDomainsHelm ` --namespace edge-infra ` --create-namespace ` @@ -112,14 +131,109 @@ class Aks { # ----- Install DNSMasq Helm chart to host DNS resolution for child cluster Write-Title("Installing DNSMasqAks Helm chart for DNS resolution of child cluster") - helm install dnsmasq azdistributededge/dnsmasqaks ` - --set-string proxyDnsServer="$proxyIp" ` - --namespace edge-infra ` - --wait + if($installObservability){ + # Additional domains for observability + $dnsCustomDomainsHash = @{} + $dnsCustomDomainsHash.Add("gcr_$($dnsCustomDomainsHash.Count)", "gcr.io") + $dnsCustomDomainsHash.Add("jetstack_$($dnsCustomDomainsHash.Count)", "charts.jetstack.io") + $dnsCustomDomainsHash.Add("otel_$($dnsCustomDomainsHash.Count)", "open-telemetry.github.io") + $dnsCustomDomainsHash.Add("k8s_$($dnsCustomDomainsHash.Count)", "registry.k8s.io") + $dnsCustomDomainsHash.Add("k8s_$($dnsCustomDomainsHash.Count)", "cr.fluentbit.io") + $dnsCustomDomainsHelm = $dnsCustomDomainsHash.GetEnumerator() | ForEach-Object { "customDomains.$($_.Key)=$($_.Value)" } + $dnsCustomDomainsHelm = $dnsCustomDomainsHelm -Join "," + + helm install dnsmasq azdistributededge/dnsmasqaks ` + --set-string proxyDnsServer="$proxyIp" ` + --set $dnsCustomDomainsHelm ` + --namespace edge-infra ` + --wait + } + else { + helm install dnsmasq azdistributededge/dnsmasqaks ` + --set-string proxyDnsServer="$proxyIp" ` + --namespace edge-infra ` + --wait + } $dnsService = kubectl get service dsnmasq-service -n edge-infra -o json | ConvertFrom-Json $dnsMasqIp = $dnsService.status.loadBalancer.ingress.ip + # ----- Bootstrap monitoring and observability + if($installObservability) { + + $metadataNetworkLayer = $aksName.Substring($aksName.Length - 2, 2) + Write-Title("Observability - install pre-req cert-manager for AKS $aksName") + helm repo add --force-update jetstack https://charts.jetstack.io + helm repo update + + helm upgrade cert-manager jetstack/cert-manager ` + --install ` + --namespace cert-manager --create-namespace ` + --set installCRDs=true ` + --wait + + helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts + helm repo update + + helm upgrade opentelemetry-operator open-telemetry/opentelemetry-operator ` + --install ` + --version "0.36.0" ` + --namespace monitoring --create-namespace ` + --wait + + if($null -ne $parentProxyConfig) { + Write-Title("Install Helm chart with OpenTelemetry and FluentBit for AKS $aksName with parent collector") + + $parentProxyIp = $parentProxyConfig.ProxyIp + $parentEndpoint = "http://${parentProxyIp}:4318" + helm upgrade otelcollection azdistributededge/otelcollection ` + --install ` + --namespace monitoring ` + --set exporters.parentOtlp.enabled=true ` + --set-string exporters.parentOtlp.endpoint="$parentEndpoint" ` + --set-string metadata.region="$arcLocation" ` + --set-string metadata.clusterName="$aksName" ` + --set-string metadata.networkLayer="$metadataNetworkLayer" ` + --wait + } + else { + Write-Title("Install Observability Helm chart with Grafana as visualization for AKS $aksName in top layer") + helm upgrade edgeobservability azdistributededge/edgeobservability ` + --install ` + --create-namespace --namespace observability --wait + + Write-Title("Install Helm chart with OpenTelemetry and FluentBit for AKS $aksName to Az Monitor") + helm upgrade otelcollection azdistributededge/otelcollection ` + --install ` + --namespace monitoring ` + --set exporters.azuremonitor.enabled=true ` + --set-string exporters.azuremonitor.instrumentationKey="$otelAppInsightsKey" ` + --set-string metadata.region="$arcLocation" ` + --set-string metadata.clusterName="$aksName" ` + --set-string metadata.networkLayer="$metadataNetworkLayer" ` + --set exporters.prometheus.enabled=true ` + --set exporters.loki.enabled=true ` + --set exporters.jaeger.enabled=true ` + --wait + } + + # Update the Envoy proxy configuration to include Otel Collector listener and cluster now it's available + Write-Title("Update Envoy proxy configuration to include OpenTelemetry Collector listener and cluster") + + $otelService = kubectl get service otel-collector -n monitoring -o json | ConvertFrom-Json + $otelCollectorIp = $otelService.spec.clusterIP + + # temporary hack - set new value for logLevel argument to trigger a restart of the Envoy pod to load cofigmap changes + helm upgrade envoy azdistributededge/envoy-reverseproxy ` + --reuse-values ` + --set observability.enableOtelCollector=true ` + --set-string observability.otelCollectorIp="$otelCollectorIp" ` + --set-string observability.otelCollectorPort="4318" ` + --set-string arguments.logLevel="warn" ` + --namespace edge-infra ` + --wait + } + if ($enableArc) { # ----- Before enrolling with Arc: create Service Account, get token and store in temp folder for Arc Cluster Connect in other scripts Write-Title("Before enrolling AKS $aksName with Arc: create ServiceAccount and store token on disk") @@ -274,9 +388,10 @@ Write-Title("Deploy Bicep files - Vnet") $ParentConfigVnetName = If ($ParentConfig -eq $null) { "" } Else { $ParentConfig.VnetName } $ParentConfigVnetResourceGroup = If ($ParentConfig -eq $null) { "" } Else { $ParentConfig.VnetResourceGroup } $closeOutboundInternetAccess = If ($SetupArc -eq $true -and $ParentConfig -ne $null) { $true } Else { $false } +$provisionAzureMonitoring = If ($SetupObservability -eq $true -and $ParentConfig -eq $null) { $true } Else { $false } $r = (az deployment sub create --location $Location ` - --template-file ./bicep/core-infra-vnet.bicep --parameters ` + --template-file ./bicep/core-infra-base.bicep --parameters ` applicationName=$ApplicationName ` remoteVnetName=$ParentConfigVnetName ` remoteVnetResourceGroupName=$ParentConfigVnetResourceGroup ` @@ -286,6 +401,7 @@ $r = (az deployment sub create --location $Location ` currentAzUsernameId=$currentAzUsernameId ` aksObjectId=$aksObjectId ` closeOutboundInternetAccess=$closeOutboundInternetAccess ` + provisionMonitoring=$provisionAzureMonitoring ` location=$Location ` --name "core-$deploymentId" ` ) | ConvertFrom-Json @@ -293,6 +409,7 @@ $r = (az deployment sub create --location $Location ` $vnetSubnetId = $r.properties.outputs.subnetId.value $aksClusterResourceGroupName = $r.properties.outputs.aksResourceGroup.value $aksClusterName = $r.properties.outputs.aksName.value +$appInsightsInstrumentationKey = $provisionAzureMonitoring -eq $true ? $r.properties.outputs.appInsightsInstrumentationKey.value : "" if ($closeOutboundInternetAccess -eq $true) { @@ -326,7 +443,13 @@ if($SetupArc) # ----- Install core dependencies in AKS cluster $aks = [Aks]::new() -$proxyConfig = $aks.Prepare($aksClusterResourceGroupName, $aksClusterName, $ParentConfig, $SetupArc, $Location) +$proxyConfig = $aks.Prepare($aksClusterResourceGroupName, + $aksClusterName, + $ParentConfig, + $SetupArc, + $Location, + $SetupObservability, + $appInsightsInstrumentationKey) $config = [PSCustomObject]@{ AksClusterName = $aksClusterName diff --git a/deployment/deploy-core-platform.ps1 b/deployment/deploy-core-platform.ps1 index d4d7fae9..00701045 100644 --- a/deployment/deploy-core-platform.ps1 +++ b/deployment/deploy-core-platform.ps1 @@ -75,6 +75,7 @@ if($DeployDapr){ helm repo update helm upgrade --install dapr dapr/dapr ` --version=1.10 ` + --set global.logAsJson=true ` --namespace edge-core ` --create-namespace ` --wait ` @@ -127,6 +128,7 @@ if ($null -eq $MosquittoParentConfig){ # use default mosquitto deployment helm install mosquitto azedgefx/mosquitto ` --namespace edge-core ` + --set-string logLevel="warning" ` --set-file certs.ca.crt="$tempCertsFolder/ca.crt" ` --set-file certs.server.crt="$tempCertsFolder/$AksClusterName.crt" ` --set-file certs.server.key="$tempCertsFolder/$AksClusterName.key" ` @@ -145,6 +147,7 @@ else { helm install mosquitto azedgefx/mosquitto ` --namespace edge-core ` + --set-string logLevel="warning" ` --set-string bridge.enabled="true" ` --set-string bridge.connectionName="$AksClusterName-parent" ` --set-string bridge.remotename="$parentCluster" ` diff --git a/deployment/deploy-dev-app-l2.ps1 b/deployment/deploy-dev-app-l2.ps1 index d8325e20..34f51e54 100644 --- a/deployment/deploy-dev-app-l2.ps1 +++ b/deployment/deploy-dev-app-l2.ps1 @@ -25,7 +25,15 @@ Param( $L4AppConfig = $null, [string] - $Location = 'westeurope' + $Location = 'westeurope', + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true, + + [Parameter(Mandatory = $false)] + [int] + $SamplingRate = 0 ) # Uncomment this if you are testing this script without deploy-az-demo-bootstrapper.ps1 @@ -84,11 +92,15 @@ Write-Title("Install Pod/Containers with Helm in Cluster L2") $simtempimage = $acrName + ".azurecr.io/simulatedtemperaturesensormodule:" + $deploymentId $opcplcimage = "mcr.microsoft.com/iotedge/opc-plc:2.2.0" $opcpublisherimage = $acrName + ".azurecr.io/$staticBranchName/iotedge/opc-publisher:" + $deploymentId + "-linux-amd64" +$observabilityString = ($SetupObservability -eq $true) ? "true" : "false" + helm install iot-edge-l2 ./helm/iot-edge-l2 ` --set-string images.simulatedtemperaturesensormodule="$simtempimage" ` --set-string images.opcplcmodule="$opcplcimage" ` --set-string images.opcpublishermodule="$opcpublisherimage" ` + --set observability.enabled=$observabilityString ` + --set-string observability.samplingRate="$SamplingRate" ` --namespace $appKubernetesNamespace ` --create-namespace ` --wait diff --git a/deployment/deploy-dev-app-l4.ps1 b/deployment/deploy-dev-app-l4.ps1 index fe46565f..3c301ccf 100644 --- a/deployment/deploy-dev-app-l4.ps1 +++ b/deployment/deploy-dev-app-l4.ps1 @@ -21,7 +21,15 @@ Param( $AksServicePrincipalName, [string] - $Location = 'westeurope' + $Location = 'westeurope', + + [Parameter(Mandatory = $false)] + [bool] + $SetupObservability = $true, + + [Parameter(Mandatory = $false)] + [int] + $SamplingRate = 0 ) # Uncomment this if you are testing this script without deploy-az-dev-bootstrapper.ps1 @@ -72,6 +80,7 @@ az aks get-credentials ` --resource-group $AksClusterResourceGroupName ` --overwrite-existing +$observabilityString = ($SetupObservability -eq $true) ? "true" : "false" # ----- Run Helm Write-Title("Install Pod/Containers with Helm in Cluster") $datagatewaymoduleimage = $acrName + ".azurecr.io/datagatewaymodule:" + $deploymentId @@ -80,6 +89,8 @@ helm install iot-edge-l4 ./helm/iot-edge-l4 ` --set-string dataGatewayModule.eventHubConnectionString="$eventHubConnectionString" ` --set-string dataGatewayModule.storageAccountName="$storageName" ` --set-string dataGatewayModule.storageAccountKey="$storageKey" ` + --set-string observability.samplingRate="$SamplingRate" ` + --set observability.enabled=$observabilityString ` --namespace $appKubernetesNamespace ` --create-namespace ` --wait diff --git a/deployment/deploy-dev.md b/deployment/deploy-dev.md index 4980e7e3..55cbd89f 100644 --- a/deployment/deploy-dev.md +++ b/deployment/deploy-dev.md @@ -34,9 +34,15 @@ By default this script provisions resources in Azure region `West Europe`. To us ## The Main Functions in the Script -1. Deploy infrastructure with Bicep, the script deploys three AKS clusters. +> Note: review the arguments you can pass to the script `./deploy-az-dev-bootstrapper.ps1` so you can control deployment location, whether to deploy observability stack and more. + +1. Deploy infrastructure with Bicep, the script deploys three AKS clusters when set using the default arguments. - AKS - - VNET (and VNET peering) and NSGs + - VNET + - Log Analytics workspace + - App Insights component + - Envoy reverse proxy + - DNSMasq sample DNS server 2. Download AKS credentials. diff --git a/docs/observability.md b/docs/observability.md new file mode 100644 index 00000000..877e0820 --- /dev/null +++ b/docs/observability.md @@ -0,0 +1,155 @@ +# Observability for Distributed Edge + +To understand how an application is behaving, it's key to integrate observability into the application end to end. Observability refers to the ability to gain insights into the internal workings of a system, while monitoring involves collecting and analyzing data to ensure the system's health and performance. + +When we discuss monitoring, we typically refer to signals captured from all running systems, be it OS, databases, custom workloads and Kubernetes platform itself. The types of signals are typically split into: + +- Logs +- Traces +- Metrics + +The following diagram shows a logical view of the tools and components that are used on each network level’s Kubernetes cluster to capture and process signals. + +![logical view on observability components per network layer](../architecture/observability-stacked.png) + +The following diagram is an adapted version of the three-layered network topology diagram found in the repo’s [readme](../README.md) document. The `monitoring` namespace, as depicted in detail on the lower level, is deployed in every Kubernetes cluster, while the `observability` namespace is only needed at the top-most layer to provide edge-based visualization. The top-most layer is also where all signals are collected by a centralized OpenTelemetry Collector. These signals are then used by the local visualization tools and sent to the cloud. + +![high level nested architecture with observability focus](../architecture/observability-hld-slide.png "Observability") + +## Edge-specific Requirements + +Capturing signals is done primarily at the edge due to the architecture of this sample. Having only a few components like Event Hubs and Azure Container Registry as current cloud resources, the bigger portion of the observability solution needs to run on the edge. + +In addition to the observability stack deployment to measure, analyze and act on monitoring of the platform, the edge comes with some additional challenges. Special network topology with restricted access between network layers, occasionally disconnected scenarios and sometimes need for a local-only or at least locally available copy of the solution to address longer periods of cloud network losses. Local persistence and queuing of data in case of prolonged connectivity loss is an additional complexity. This latter topic of is being worked on as a subsequent addition to this sample. + +This solution starts by addressing network topology requirements by building on top of the [nested proxy with Envoy solution](./reverseproxy.md), and aggregate the monitoring solution in the cloud. At the edge there is a local optional observability stack deployed to offer edge based visualization of the signals through Grafana. + +## OpenTelemetry + +With Microsoft's commitment to embrace OpenTelemetry in its Azure Monitor products - [Azure Monitor Application Insights - OpenTelemetry](https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-overview#opentelemetry) - this sample is taking a dependency on this open-source, industry supported telemetry instrumentation technology. + +OpenTelemetry consists of two functionalities: + +- OpenTelemetry Protocol (OTLP) +- OpenTelemetry Collector + +### OpenTelemetry Kubernetes Operator + +The OpenTelemetry Operator manages OpenTelemetry Collector and allows for auto-instrumentation of code in different languages. When configuring `receivers` that are standard extensions, the operator ensures ClusterIP services are created automatically within the cluster, minimizing custom setup within the deployment approach with custom Helm charts. + +Evaluation is still ongoing whether it adds value in this solution to use the Operator, versus managing the Collector directly. The main reason for not using Operator today is the lack of needs for auto-instrumentation injection as all custom workloads leverage Dapr which has its own set of monitoring options already integrated. + +OpenTelemetry Collector Deployment Options + +- Deployment +- DaemonSet +- StatefulSet +- Sidecar + +### Azure Monitor Exporter for OpenTelemetry + +This exporter is part of OpenTelemetry-Collector-contrib repo, and is in beta at time of writing. + +Details: [Azure Monitor Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/azuremonitorexporter). + +Refer to the documentation to get a better understanding of how mapping of attributes works, and in which tables the traces, logs and metrics are stored in Azure Application Insights. + +| OpenTelemetry | Azure App Insights | +|----------|----------| +| Traces | dependencies | +| Metrics | customMetrics | +| Logs | traces | +| Exception trace events | exception | + +### Fluent Bit and OpenTelemetry + +Logs in the Kubernetes components and custom workloads are collected using Fluent Bit, installed via a `DaemonSet`` on each node of the cluster. It is configured to scrape all logs, and using Fluent Bit's ['OpenTelemetry output plugin'](https://docs.fluentbit.io/manual/pipeline/outputs/opentelemetry), it automatically forwards all logs into the OpenTelemetry collector's HTTP OTLP endpoint. + +### Edge Metadata Enrichment of Logs, Metrics and Traces + +When emitting and collecting signals in out of the box scenarios and not enriching signals through an SDK, the data collected will contain no specific information about the Kubernetes cluster monitored. There might be some information injected by some of the tooling like Fluent Bit logs, but when looking at Prometheus metrics there will be no reference to which cluster this pertains to. + +In edge computing scenarios you may want to aggregate monitoring data across the three (or more) networking layers, as well as aggregating across different edge locations. OpenTelemetry offers a few solutions for enriching signals by leveraging `Processors` like [`Resource`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourceprocessor) and [`Attributes`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.81.0/processor/attributesprocessor/README.md). + +In this sample we deploy a three layered network topology and want to understand the aggregated logs in Azure Monitor. To achieve this we leverage these custom processors to enrich the signals as they get exported per layer. +Based on the spec, using the `Action` of `insert` ensures the key is added if not yet existent. When a lower level OpenTelemetry Collector moves its signals through the different levels, existing attributes will not be overwritten thus ensuring each original signal only receives the new metadata at time of first export from the initial location. + +Today metadata such as `edge.layer`, `edge.region` and `k8s.cluster.name` are inserted and configured at installation time by providing custom values to the Helm chart. +There are also other options to automatically add attributes, such as Environment Variables through the [`Resource Detection Processor`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md). + +```yaml +processors: + + resource: + attributes: + - key: edge.layer + value: {{ .Values.metadata.networkLayer }} + action: insert + - key: edge.region + value: {{ .Values.metadata.region }} + action: insert + - key: k8s.cluster.name + value: {{ .Values.metadata.clusterName }} + action: insert +``` + +Updating these values after initial Helm chart installation can be achieved by executing `helm upgrade`. + +Within Azure Application Insights the added attributes are available in `customDimensions` array and can be further used to query the logs, metrics and traces. +Example of a log entry in the `traces` table in App Insights, customDimensions: + +```json +{ + "cpu":"1", + "edge.layer":"L4", + "edge.region":"northeurope", + "k8s.cluster.name":"[-myclusterdeploymentname-]", + "http.scheme":"http", + "instrumentationlibrary.name":"otelcol/prometheusreceiver", + "instrumentationlibrary.version":"0.81.0", + "k8s.container.name":"node-exporter", + "k8s.daemonset.name":"otelcollection-prometheus-node-exporter", + "k8s.namespace.name":"monitoring", + "k8s.node.name":"[...]", + "k8s.pod.name":"otelcollection-prometheus-node-exporter-xxx", + "k8s.pod.uid":"[...]", + "net.host.name":"[...]", + "net.host.port":"9100", + "service.instance.id":"..:9100", + "service.name":"node-exporter" +} +``` + +### OpenTelemetry Support in Components of the Solution + +This section contains a number of relevant resources of (initial) built-in OpenTelemetry support for some of the technologies used in this sample. + +- **Envoy Proxy**: + - Envoy Proxy has built-in support for [OpenTelemetry request tracing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/observability/tracing#arch-overview-tracing), but since this feature relies on HTTP Connection Manager and HTTP Filters which would require TLS termination, it is not being implemented in this sample. + - Prometheus compatible metrics endpoint is available but requires some special configuration to expose it under its default `/metrics` URI. This special configuration can be observed in Envoy's Helm chart, under the listener named `envoy-prometheus-http-listener` in the configmap. +- **Dapr**: out of the box support for OpenTelemetry tracing and Prometheus metrics scraping. The current implementation uses `Zipkin` endpoint on OpenTelemetry to push traces from Dapr components to OpenTelemetry. +- **Mosquitto** - Mosquitto does not have any native support for metrics or traces with OpenTelemetry. Logs however can be extracted by integrating with Fluent Bit which will collect logs automatically based on annotations. Trace context propagation is not available on the broker, contrary to some of the other open source brokers. There is an initial [draft spec for standardizing W3C Tracing in MQTT](https://w3c.github.io/trace-context-mqtt/). +- **Kubernetes**: there is extensive support for OpenTelemetry in Kubernetes environments, including the OpenTelemetry Operator, Helm charts for collector and [extensive documentation](https://opentelemetry.io/docs/kubernetes/). + +## Local Observability Visualization + +Within the edge solution, on Level 4 of the network topology, a set of components are deployed to offer local visibility to the observability data. This is done by leveraging open source components such as Grafana, Tempo, Jaeger and Loki. + +In order to view explore the data through the Grafana dashboard and data sources, you can use port-forwarding to access the endpoint: + +```bash +# First identify the name of the grafana pod in the observability namespace + +kubectl port-forward grafana-xxx-xxx 3000:3000 -n observability + +``` + +After forwarding the port you can open Grafana dashboard by going to [http://localhost:3000](http://localhost:3000). + +## Future Additions and Explorations + +- Remotely configuring log levels +- Solution to ensure log levels are set back to warning after specified period of time especially if the cluster would go offline +- Edge persistence storage queue between edge and Azure Monitor. This will prevent data loss in case of (prolonged) connectivity issues to the cloud +- Custom workload observability: evaluate usage of OpenTelemetry auto-instrumentation and adding application specific tracing and logging instead of default Dapr observability +- Add TLS to OpenTelemetry Collectors for traffic between the network layers