framework Create MDS unittest for specific scenario

We assume the following scenario can result in a bad behavior when running the 'ensure_safety' check for a vdisk

Safety is configured on 2
Volume V has master on node1, slave on node2
Master node dies, HA kicks in, volume gets moved by volumedriver to node2
Volumedriver sends owner_changed event and fwk runs ensure_safety for said volume
The logging indicated 2 reasons for reconfiguration and an error:
- Not enough safety
- Not enough services in use in primary domain
- Failed to update the metadata backend configuration
Framework eventually configured node3 to be master and node4 to be slave, removing node2 from the config altogether

            try:
                if len(configs_no_ex_master) != len(configs_all):
                    vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                              metadata_backend_config=MDSMetaDataBackendConfig(configs_no_ex_master),
                                                                              req_timeout_secs=5)
                vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                          metadata_backend_config=MDSMetaDataBackendConfig(configs_all),
                                                                          req_timeout_secs=5)
            except Exception:
                MDSServiceController._logger.exception('MDS safety: vDisk {0}: Failed to update the metadata backend configuration'.format(vdisk.guid))
                raise Exception('MDS configuration for volume {0} with guid {1} could not be changed'.format(vdisk.name, vdisk.guid))

We assume that the 1st update_metadata_backend_config was initiated and timed out after 5 seconds, thus the 2nd update was not executed, which presumeably contained the original master node (node2). But this cannot be verified from the logging available On voldrv side we did see the actual updateMetadataBackendConfig succeeded, but took 386s for it to complete

Aug 16 '17 14:08 kvanhijf

The scenario was slightly different:

volume V has MDS master table on node1, slave on node2
node1 is down: node3 takes ownership of V (at this point the framework is notified of owner change)
as part of starting V on node3, V promotes the MDS table on node2 to the master role
the framework spins up a new MDS slave table for V on node2
the framework reconfigures V with MDS [ node2, node4 ], forcing the volume to use the MDS table on node2 as master

WRT updateMetaDataBackendConfig timing out: this should be visible in the volumedriver log in this scenario as the timeout is a client side timeout. I.e. on the server side the request will be executed (either with a delay, or it simply takes too long) and when attempting to send the response an error will be logged as the connection was dropped by the client side.

Aug 18 '17 16:08 redlicha

@Arne: Should we wonder too why node3 takes ownership in your 2nd bulletpoint, i/o node2. Because node2 is currently a slave and node3 doesn't know anything yet?

Aug 21 '17 06:08 kvanhijf

@kvanhijf it's something I was wondering as well. Voldrv / edge don't use the MDS config at the moment for the HA preferences (this might be worth an FR). @saelbrec mentioned that "the FWK uses the distance maps for its domain implementation, and those are already taken into account for MDS placements", but I suppose that's at another granularity (DC, not individual voldrv's)?

Aug 21 '17 07:08 redlicha

@redlicha : The distance mapping we use is based on how the primary and secondary domains have been configured by the customer, so the customer can decide which storagerouters belong in which domain/datacenter/...

This distance mapping we calculate, is then passed on to the clusterRegistryClient --> set_node_configs and has a structure as follows:

{
    'vrouter_id': <id>,
    ....
    'node_distance_map': {
        <storagedriver_id_1>: 0,
        <storagedriver_id_2>: 0,
        <storagedriver_id_3>: 10000,
        <storagedriver_id_4>: 20000
    }
}

Aug 21 '17 12:08 kvanhijf

https://github.com/openvstorage/framework/pull/1785 -->

Sep 22 '17 10:09 kvanhijf

Waiting for the volumedriver to port the changes (related to max tlogs behind) to EE

Sep 25 '17 13:09 kvanhijf

The relevant changes are already present in the ee-6.17.0 release packages (volumedriver-ee-release-ubuntu-16.04 on jenkins). These were not published to the EE apt repos yet due to a fwd compat issue https://github.com/openvstorage/volumedriver/issues/346 but the fix for this will not cause changes to the newly introduced config options / API changes. The corresponding OSE packages (6.17.1) have been published to the OSE apt repos, so these could be used?

Sep 25 '17 13:09 redlicha

@kvanhijf anything left todo? This is in state_inprogress for 2 months.

Nov 28 '17 08:11 wimpers

@wimpers : Well this PR relies on changes in the volumedriver to be able to set the max tlogs behind in the config As far as i can tell has this been fixed and published in the OSE release of the volumedriver, but not in the EE release of the volumedriver. Since the framework doesn't make difference between EE and OSE for this specific change, i think we cannot merge this in as long as this hasn't been fixed and released on EE too (volumedriver-wise)

Nov 28 '17 08:11 kvanhijf

tlog limit for MDS replay is in ee-6.17.0 (currently not yet on unstable)

Nov 28 '17 08:11 wimpers

Assigning to @JeffreyDevloo as kevin is no longer with us .

Dec 22 '17 13:12 wimpers

Configuring the max tlogs a slave may be behind to make it still eligible for failover was backported to ee-6.16.21, but only the per-volumedriver configurable and not the per-volume configuration. Default value is UINT32_MAX (4294967295) which maintains the current no limit behaviour. Reference: https://github.com/openvstorage/volumedriver-ee/issues/212

May 23 '19 07:05 redlicha