Enterprise-Scale icon indicating copy to clipboard operation
Enterprise-Scale copied to clipboard

Bug Report - Diagnostic Policies: Setting "metricsEnabled" to False causes all resources to become invertedly compliant/non-compliant because "existenceCondition" is hardcoded to True

Open NikolaiKleppe opened this issue 3 years ago • 2 comments

Describe the bug

Background: We want to disable Metrics logging to Log Analytics because of cost

Side Note: Seems like this is overlooked in the Terraform CAF Module Diagnostics Initiative because there are no parameters there to explicitly disable metrics - So metricsEnabled is always True. However the policy definitions themselves have a parameter called metricsEnabled so it should be supported at least

Assuming metricsEnabled is set to False in the policy initiatve (I have added my own parameter for this):

  • A resource with the Metrics diagnostic category Disabled will be marked as Non-Compliant - This should be Compliant
  • A resource with the Metrics diagnostic category Enabled will be marked as Compliant - This should be Non-Compliant (so we can run a Remediation task and fix it)

Some non-ESLZ policies already have the fixes I have implemented below - Is it by design that this isn't done in ESLZ or am I misunderstanding how it works? Because this affects every Diagnostics policy in ESLZ.

See Configure diagnostic settings for File Services to Log Analytics workspace | existenceCondition for an example where this is correctly implemented

Steps to reproduce

Before making changes to the policy definition

Example with policy_definition_es_deploy_diagnostics_virtualnetwork.json:

Metrics is disabled on the VNET beforehand:

image

Becomes non-compliant (this should be Compliant because we metricsEnabled set to False):

image

image

Notice the numbers: image

Which is because of this code, no?

image

After making changes to the policy definition

Add the parameter to the field:

image

Notice the compliance is inverted (because we have metricsEnabled set to False): image

Let's assume a diagnostic setting has Metrics enabled, which the policy now will mark correctly as non-compliant:

image

We can now run a Remediation task, and Metrics is removed from existing resources: image

Then becomes Compliant:

image

Screenshots

NikolaiKleppe avatar Mar 16 '23 13:03 NikolaiKleppe

Hi @NikolaiKleppe. Thanks for sharing your detailed feedback. Our approach has always been to maximize visibility and security for the platform, which includes ingesting all available logs. However, your point is very valid, and this is a topic we are working on internally - cost versus value is a crucial factor. This isn't a bug, it has been by design up until now, but in our next review cycle we will review this topic and hopefully come back with an acceptable solution.

Springstone avatar Apr 10 '23 14:04 Springstone

Hi @Springstone thanks for the follow-up. Good to know it's being discussed already

For our scenario wanting to disable Metrics I will probably just add the parameters to all the policies myself in the meantime then, rather simple solution - Then we'll see in the future what you guys come up with :-)

NikolaiKleppe avatar Apr 14 '23 09:04 NikolaiKleppe

@NikolaiKleppe please note, we've transitioned to the new category based Diagnostic Settings policies that cover most Azure services and provide options to configure log targets. Note, by default ALZ will log "allLogs", because this is supported by almost all the resource providers. You have the option to only configure "auditLogs", however, be aware that this covers a subset of services (~55 vs the 140+ for allLogs - because the resource providers do not yet support the required category).

Springstone avatar Jul 04 '24 13:07 Springstone