Bug Report - Diagnostic Policies: Setting "metricsEnabled" to False causes all resources to become invertedly compliant/non-compliant because "existenceCondition" is hardcoded to True
Describe the bug
Background: We want to disable Metrics logging to Log Analytics because of cost
Side Note: Seems like this is overlooked in the Terraform CAF Module Diagnostics Initiative because there are no parameters there to explicitly disable metrics - So metricsEnabled is always True. However the policy definitions themselves have a parameter called metricsEnabled so it should be supported at least
Assuming metricsEnabled is set to False in the policy initiatve (I have added my own parameter for this):
- A resource with the Metrics diagnostic category Disabled will be marked as Non-Compliant - This should be Compliant
- A resource with the Metrics diagnostic category Enabled will be marked as Compliant - This should be Non-Compliant (so we can run a Remediation task and fix it)
Some non-ESLZ policies already have the fixes I have implemented below - Is it by design that this isn't done in ESLZ or am I misunderstanding how it works? Because this affects every Diagnostics policy in ESLZ.
See Configure diagnostic settings for File Services to Log Analytics workspace | existenceCondition for an example where this is correctly implemented
Steps to reproduce
Before making changes to the policy definition
Example with policy_definition_es_deploy_diagnostics_virtualnetwork.json:
Metrics is disabled on the VNET beforehand:

Becomes non-compliant (this should be Compliant because we metricsEnabled set to False):


Notice the numbers:

Which is because of this code, no?

After making changes to the policy definition
Add the parameter to the field:

Notice the compliance is inverted (because we have metricsEnabled set to False):

Let's assume a diagnostic setting has Metrics enabled, which the policy now will mark correctly as non-compliant:

We can now run a Remediation task, and Metrics is removed from existing resources:

Then becomes Compliant:

Screenshots
Hi @NikolaiKleppe. Thanks for sharing your detailed feedback. Our approach has always been to maximize visibility and security for the platform, which includes ingesting all available logs. However, your point is very valid, and this is a topic we are working on internally - cost versus value is a crucial factor. This isn't a bug, it has been by design up until now, but in our next review cycle we will review this topic and hopefully come back with an acceptable solution.
Hi @Springstone thanks for the follow-up. Good to know it's being discussed already
For our scenario wanting to disable Metrics I will probably just add the parameters to all the policies myself in the meantime then, rather simple solution - Then we'll see in the future what you guys come up with :-)
@NikolaiKleppe please note, we've transitioned to the new category based Diagnostic Settings policies that cover most Azure services and provide options to configure log targets. Note, by default ALZ will log "allLogs", because this is supported by almost all the resource providers. You have the option to only configure "auditLogs", however, be aware that this covers a subset of services (~55 vs the 140+ for allLogs - because the resource providers do not yet support the required category).