resource log diagnostic settings: retention time should not be set by default
Description
Across the resource implementations, the resource log diagnostic settings are included . Here is a typical code excerpt (from 'arm/Microsoft.KeyVault/vaults/deploy.bicep':
@description('Optional. Specifies the number of days that logs will be kept for; a value of 0 will retain data indefinitely.')
@minValue(0)
@maxValue(365)
param diagnosticLogsRetentionInDays int = 365
There are a couple of problems with this approach:
- the ARM API does not require setting the retention policy (for good reasons)
- setting the retention time in days only makes sense for storage accounts
- and even with storage accounts, you would not set it generally
- Log Analytics and Event Hubs have a different concept of handling the retention time
- f. e. Log Analytics sets a default time for workspace scope, and you can set a custom value for table scope,
- I don't believe this setting has any effect on Log Analytics' handling of retention, but the setting still it is stored in the resource deployment
- have a look at the portal, you will see that you can only set the diagnostic settings retention time for storage accounts (for good reasons)
- 365 days is not a suitable value for the maximum retention time
- a common reason to export logs to storage accounts instead of LA, is to store them for very long at a low cost
- Log Analytics already allows maximum retention time of 730 days. Many users who use storage accounts need to store the logs for precisely n years (instead of either for "forever" or 365 days max)
- a common reason to export logs to storage accounts instead of LA, is to store them for very long at a low cost
- Always setting the value even if you don't have to (or it doesn't even make sense) has detrimental effects on the code you write:
- if you don't set the retention time value (because f. e. Log Analytics does not care about it), your resources are created with a value of "365 days", even if that is not the retention time you actually are required to set
- if you want consistent and correct values for your retention time (after all you have specifications), you need to always set the parameter everywhere, at every resource level, even if it does not make sense, as you should set the default retention time at the Log Analytics workspace level.
- now if you set the retention time everywhere, whenever you would want to update the default retention time, you would need to update every resource everywhere, instead of just setting the value at the log analytics workspace
- note that even if you don't use the parameter of the module, you always get the value set, with the value of 365 - even if that is entirely wrong for your specifications
Resolution
The only real fix I see is NOT setting the "retentionPolicy" per default at all, and thus making it truly "optional". This is the intention of the ARM API by the way. Setting this value by default leads to more unnecessary code in many cases, and worse, to more update consistency problems. There is no upside to setting the default like this, even if you had the requirement to only use storage accounts for log storage, and always for the retention duration of 365 days, you would still set it explicitly in your code, to make your intention visible, and not rely on hidden default values (which may change on some module release).
To break this down, so it gets addressed, you can split this issue into 2 tasks, which can be implemented seperately:
-
change the "diagnosticLogsRetentionInDays" default of 365 days to 0 days (keep forever) -> This is a no-brainer IMHO. When using a storage account to store diagnostic logs, you will often want to keep logs for long, and delete them manually or using lifecycle policies, etc. Also, the default in the portal/API is 0. Please also see the comment in the green box in the docs
-
the remaining issue is that it isn't obvious in the API that the "diagnosticLogsRetentionInDays" setting is only effective for storage accounts -> This is a little-known fact. People will often assume that this setting is effective for Log Analytics or Eventhubs too. The API does not make this obvious. You could f. e. rename the variable ("diagnosticLogsStorageAccountRetentionInDays"), or state this in the documentation explicitely (@description attribute), or create a special object structure for this.
I'm using the following to add the retention policy only when a value is provided for diagnosticStorageAccountId.
var storageAccountRetentionPolicy = {
enabled: true
days: diagnosticLogsRetentionInDays
}
var retentionPolicy = !empty(diagnosticStorageAccountId) ? storageAccountRetentionPolicy : null
var diagnosticsLogsSpecified = [for category in filter(diagnosticLogCategoriesToEnable, item => item != 'allLogs'): {
category: category
enabled: true
retentionPolicy: retentionPolicy
}]
var diagnosticsLogs = contains(diagnosticLogCategoriesToEnable, 'allLogs') ? [
{
categoryGroup: 'allLogs'
enabled: true
retentionPolicy: retentionPolicy
}
] : diagnosticsLogsSpecified
var diagnosticsMetrics = [for metric in diagnosticMetricsToEnable: {
category: metric
timeGrain: null
enabled: true
retentionPolicy: retentionPolicy
}]