centreon-plugins icon indicating copy to clipboard operation
centreon-plugins copied to clipboard

cloud::azure::management::recovery::plugin : request feature for monitoring replication (site recovery) health

Open christophe-activiumid opened this issue 3 years ago • 1 comments

hello, there are 2 azure feature in Azure recovery, backup and site recovery. actually the plugin only check backup.

the site replication is another feature that can host replicated VM from on-premise hyper-v, or vm from another azure region here is what it looks on the azure console : image

I would like to get 2 check from the recovery plugin : 1 for the replication health and 1 for the failover status, and ideally a "list" for the replicated items to be able to filter per items

all the data are in this API call https://docs.microsoft.com/en-us/rest/api/site-recovery/replication-protected-items/list

in addition to the example on the microsoft page here is a snippet from a live subscription with a warning (like on the previous screenshot I have a protection state and a replication health ok, but a failover "warning" (because the failover has not yet been tested)

{
    "value": [
        {
            "id": "/Subscriptions/....",
            "name": "......",
            "type": "Microsoft.RecoveryServices/vaults/replicationFabrics/replicationProtectionContainers/replicationProtectedItems",
            "properties": {
                "friendlyName": "**ITEM NAME**",
                "protectedItemType": "HyperVVirtualMachine",
                "protectableItemId": "/Subscriptions/......",
                "recoveryServicesProviderId": "/Subscriptions/......",
                "primaryFabricFriendlyName": "......",
                "primaryFabricProvider": "SingleHostHyperVFabric",
                "recoveryFabricFriendlyName": "Microsoft Azure",
                "recoveryFabricId": "Microsoft Azure",
                "primaryProtectionContainerFriendlyName": ".......",
                "recoveryProtectionContainerFriendlyName": "Microsoft Azure",
                "protectionState": "Protected",
                "protectionStateDescription": "Protected",
                "activeLocation": "Primary",
                "testFailoverState": "None",
                "testFailoverStateDescription": "None",
                "allowedOperations": [
                    "PlannedFailover",
                    "UnplannedFailover",
                    "DisableProtection",
                    "TestFailover"
                ],
                "replicationHealth": "Normal",
                "failoverHealth": "Warning",
                "healthErrors": [
                    {
                        "innerHealthErrors": [],
                        "errorSource": "ReplicationUnitFailoverValidatorError",
                        "errorType": "8010",
                        "errorLevel": "Warning",
                        "errorCategory": "TestFailover",
                        "errorCode": "161011",
                        "summaryMessage": "",
                        "errorMessage": "No successful test failover has been done on the virtual machine 'SRV-MDX-APP01'.",
                        "possibleCauses": "No successful test failover has been done on the virtual machine after it was replicated.",
                        "recommendedAction": "Do a test failover on the virtual machine.",
                        "creationTimeUtc": "2022-06-13T00:55:08.9678249Z",
                        "recoveryProviderErrorMessage": null,
                        "entityId": "0906015f-4932-4a58-9f74-608588ef9cf7",
                        "errorId": "6:8010",
                        "customerResolvability": "NotAllowed"
                    }
                ],
        },
..... repeat for other items
}

I can provide privately the full extract, and I have other tenant with different use case, but I think it should be sufficient with this, let me know I think the thing need for each ReplicationProtectedItemProperties are friendlyName = resource name replicationHealth protectionState failoverHealth

maybe the replicationhealth and protectionstate should be separated in 2 check instead of only 1, it may be more useful

please let me know if you plan to implement this feature or not, some of our client are asking for this specific monitoring

christophe-activiumid avatar Jun 13 '22 11:06 christophe-activiumid

Hi,

We've added that to the backlog.

I think we will have everything in a single check to avoid making multiple calls against the API.

We'll keep you posted about it.

Sims24 avatar Jul 28 '22 14:07 Sims24