framework icon indicating copy to clipboard operation
framework copied to clipboard

Create MDS unittest for specific scenario

Open kvanhijf opened this issue 8 years ago • 12 comments

We assume the following scenario can result in a bad behavior when running the 'ensure_safety' check for a vdisk

  • Safety is configured on 2
  • Volume V has master on node1, slave on node2
  • Master node dies, HA kicks in, volume gets moved by volumedriver to node2
  • Volumedriver sends owner_changed event and fwk runs ensure_safety for said volume
  • The logging indicated 2 reasons for reconfiguration and an error:
    • Not enough safety
    • Not enough services in use in primary domain
    • Failed to update the metadata backend configuration
  • Framework eventually configured node3 to be master and node4 to be slave, removing node2 from the config altogether
            try:
                if len(configs_no_ex_master) != len(configs_all):
                    vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                              metadata_backend_config=MDSMetaDataBackendConfig(configs_no_ex_master),
                                                                              req_timeout_secs=5)
                vdisk.storagedriver_client.update_metadata_backend_config(volume_id=str(vdisk.volume_id),
                                                                          metadata_backend_config=MDSMetaDataBackendConfig(configs_all),
                                                                          req_timeout_secs=5)
            except Exception:
                MDSServiceController._logger.exception('MDS safety: vDisk {0}: Failed to update the metadata backend configuration'.format(vdisk.guid))
                raise Exception('MDS configuration for volume {0} with guid {1} could not be changed'.format(vdisk.name, vdisk.guid))

We assume that the 1st update_metadata_backend_config was initiated and timed out after 5 seconds, thus the 2nd update was not executed, which presumeably contained the original master node (node2). But this cannot be verified from the logging available On voldrv side we did see the actual updateMetadataBackendConfig succeeded, but took 386s for it to complete

kvanhijf avatar Aug 16 '17 14:08 kvanhijf

The scenario was slightly different:

  • volume V has MDS master table on node1, slave on node2
  • node1 is down: node3 takes ownership of V (at this point the framework is notified of owner change)
  • as part of starting V on node3, V promotes the MDS table on node2 to the master role
  • the framework spins up a new MDS slave table for V on node2
  • the framework reconfigures V with MDS [ node2, node4 ], forcing the volume to use the MDS table on node2 as master

WRT updateMetaDataBackendConfig timing out: this should be visible in the volumedriver log in this scenario as the timeout is a client side timeout. I.e. on the server side the request will be executed (either with a delay, or it simply takes too long) and when attempting to send the response an error will be logged as the connection was dropped by the client side.

redlicha avatar Aug 18 '17 16:08 redlicha

@Arne: Should we wonder too why node3 takes ownership in your 2nd bulletpoint, i/o node2. Because node2 is currently a slave and node3 doesn't know anything yet?

kvanhijf avatar Aug 21 '17 06:08 kvanhijf

@kvanhijf it's something I was wondering as well. Voldrv / edge don't use the MDS config at the moment for the HA preferences (this might be worth an FR). @saelbrec mentioned that "the FWK uses the distance maps for its domain implementation, and those are already taken into account for MDS placements", but I suppose that's at another granularity (DC, not individual voldrv's)?

redlicha avatar Aug 21 '17 07:08 redlicha

@redlicha : The distance mapping we use is based on how the primary and secondary domains have been configured by the customer, so the customer can decide which storagerouters belong in which domain/datacenter/...

This distance mapping we calculate, is then passed on to the clusterRegistryClient --> set_node_configs and has a structure as follows:

{
    'vrouter_id': <id>,
    ....
    'node_distance_map': {
        <storagedriver_id_1>: 0,
        <storagedriver_id_2>: 0,
        <storagedriver_id_3>: 10000,
        <storagedriver_id_4>: 20000
    }
}

kvanhijf avatar Aug 21 '17 12:08 kvanhijf

https://github.com/openvstorage/framework/pull/1785 -->

kvanhijf avatar Sep 22 '17 10:09 kvanhijf

Waiting for the volumedriver to port the changes (related to max tlogs behind) to EE

kvanhijf avatar Sep 25 '17 13:09 kvanhijf

The relevant changes are already present in the ee-6.17.0 release packages (volumedriver-ee-release-ubuntu-16.04 on jenkins). These were not published to the EE apt repos yet due to a fwd compat issue https://github.com/openvstorage/volumedriver/issues/346 but the fix for this will not cause changes to the newly introduced config options / API changes. The corresponding OSE packages (6.17.1) have been published to the OSE apt repos, so these could be used?

redlicha avatar Sep 25 '17 13:09 redlicha

@kvanhijf anything left todo? This is in state_inprogress for 2 months.

wimpers avatar Nov 28 '17 08:11 wimpers

@wimpers : Well this PR relies on changes in the volumedriver to be able to set the max tlogs behind in the config As far as i can tell has this been fixed and published in the OSE release of the volumedriver, but not in the EE release of the volumedriver. Since the framework doesn't make difference between EE and OSE for this specific change, i think we cannot merge this in as long as this hasn't been fixed and released on EE too (volumedriver-wise)

kvanhijf avatar Nov 28 '17 08:11 kvanhijf

tlog limit for MDS replay is in ee-6.17.0 (currently not yet on unstable)

wimpers avatar Nov 28 '17 08:11 wimpers

Assigning to @JeffreyDevloo as kevin is no longer with us .

wimpers avatar Dec 22 '17 13:12 wimpers

Configuring the max tlogs a slave may be behind to make it still eligible for failover was backported to ee-6.16.21, but only the per-volumedriver configurable and not the per-volume configuration. Default value is UINT32_MAX (4294967295) which maintains the current no limit behaviour. Reference: https://github.com/openvstorage/volumedriver-ee/issues/212

redlicha avatar May 23 '19 07:05 redlicha