Management Server - Prepare for Maintenance and Cancel Maintenance improvements
Description
This PR has the following improvements for Management Server - Prepare for Maintenance and Cancel Maintenance.
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance, and unblocks Management Server in "preparing for maintenance" state when any Host stuck in "prepare for maintenance" (due to connectivity/race condition issues) or in other maintenance states.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API, to initiate the rebalancing of the indirect agents immediately instead of waiting for the 'indirect.agent.lb.check.interval' time.
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API, in case migrate agents failed & they can re-connect to other available management node.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
- [ ] test (unit or integration test code)
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [x] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Codecov Report
Attention: Patch coverage is 26.40000% with 184 lines in your changes missing coverage. Please review.
Project coverage is 16.57%. Comparing base (
41b4f0a) to head (fba3483). Report is 40 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #10995 +/- ##
============================================
- Coverage 16.57% 16.57% -0.01%
- Complexity 13870 13972 +102
============================================
Files 5719 5743 +24
Lines 507200 510648 +3448
Branches 61574 62105 +531
============================================
+ Hits 84093 84652 +559
- Misses 413688 416519 +2831
- Partials 9419 9477 +58
| Flag | Coverage Δ | |
|---|---|---|
| uitests | 3.90% <ø> (-0.07%) |
:arrow_down: |
| unittests | 17.47% <26.40%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13687
@blueorangutan test
@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests
[SF] Trillian test result (tid-13488) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 60623 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10995-t13488-kvm-ol8.zip Smoke tests completed. 141 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|
@sureshanaparti I guess this is ready for review ?
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13856
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13900
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13931
[SF] Trillian test result (tid-13620) Environment: xcpng83 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 73040 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10995-t13620-xcpng83.zip Smoke tests completed. 135 look OK, 5 have errors, 1 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|---|---|---|
| test_02_enableHumanReadableLogs | Error |
0.26 | test_human_readable_logs.py |
| test_01_prepare_and_cancel_maintenance | Error |
0.16 | test_ms_maintenance_and_safe_shutdown.py |
| test_oobm_issue_power_cycle | Error |
2.31 | test_outofbandmanagement_nestedplugin.py |
| test_oobm_issue_power_off | Error |
2.31 | test_outofbandmanagement_nestedplugin.py |
| test_oobm_issue_power_on | Error |
2.32 | test_outofbandmanagement_nestedplugin.py |
| test_oobm_issue_power_reset | Error |
2.34 | test_outofbandmanagement_nestedplugin.py |
| test_oobm_issue_power_soft | Error |
2.35 | test_outofbandmanagement_nestedplugin.py |
| test_oobm_issue_power_status | Error |
2.34 | test_outofbandmanagement_nestedplugin.py |
| test_01_primary_storage_iscsi | Error |
1.01 | test_primary_storage.py |
| test_01_webhook_deliveries | Failure |
10.34 | test_webhook_delivery.py |
| all_test_image_store_object_migration | Skipped |
--- | test_image_store_object_migration.py |
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13955
@blueorangutan test
@vladimirpetrov a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests
[SF] Trillian test result (tid-13643) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 99677 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10995-t13643-kvm-ol8.zip Smoke tests completed. 101 look OK, 40 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|---|---|---|
| ContextSuite context=TestClusterDRS>:setup | Error |
0.00 | test_cluster_drs.py |
| test_nic_secondaryip_add_remove | Error |
24.27 | test_multipleips_per_nic.py |
| test_network_acl | Error |
2.37 | test_network_acl.py |
| test_01_verify_ipv6_network | Error |
3.31 | test_network_ipv6.py |
| test_01_verify_ipv6_network | Error |
3.31 | test_network_ipv6.py |
| test_03_network_operations_on_created_vm_of_otheruser | Error |
2.80 | test_network_permissions.py |
| test_03_network_operations_on_created_vm_of_otheruser | Error |
2.80 | test_network_permissions.py |
| test_04_deploy_vm_for_other_user_and_test_vm_operations | Failure |
1.54 | test_network_permissions.py |
| ContextSuite context=TestNetworkPermissions>:teardown | Error |
1.56 | test_network_permissions.py |
| test_delete_account | Error |
23.54 | test_network.py |
| test_delete_network_while_vm_on_it | Error |
2.55 | test_network.py |
| test_delete_network_while_vm_on_it | Error |
2.55 | test_network.py |
| test_deploy_vm_l2network | Error |
2.56 | test_network.py |
| test_deploy_vm_l2network | Error |
2.56 | test_network.py |
| test_l2network_restart | Error |
3.80 | test_network.py |
| test_l2network_restart | Error |
3.80 | test_network.py |
| ContextSuite context=TestL2Networks>:teardown | Error |
4.95 | test_network.py |
| ContextSuite context=TestPortForwarding>:setup | Error |
13.04 | test_network.py |
| ContextSuite context=TestPublicIP>:setup | Error |
13.78 | test_network.py |
| test_reboot_router | Error |
7.87 | test_network.py |
| test_releaseIP | Error |
9.25 | test_network.py |
| test_releaseIP_using_IP | Error |
8.04 | test_network.py |
| ContextSuite context=TestRouterRules>:setup | Error |
16.57 | test_network.py |
| test_01_deployVMInSharedNetwork | Failure |
1.36 | test_network.py |
| test_02_verifyRouterIpAfterNetworkRestart | Failure |
1.12 | test_network.py |
| test_03_destroySharedNetwork | Failure |
1.12 | test_network.py |
| ContextSuite context=TestSharedNetwork>:teardown | Error |
2.30 | test_network.py |
| test_01_deployVMInSharedNetwork | Failure |
1.47 | test_network.py |
| ContextSuite context=TestSharedNetworkWithConfigDrive>:teardown | Error |
2.58 | test_network.py |
| test_01_nic | Error |
56.36 | test_nic.py |
| test_01_non_strict_host_anti_affinity | Error |
2.75 | test_nonstrict_affinity_group.py |
| test_02_non_strict_host_affinity | Error |
2.60 | test_nonstrict_affinity_group.py |
| ContextSuite context=TestIsolatedNetworksPasswdServer>:setup | Error |
0.00 | test_password_server.py |
| test_01_isolated_persistent_network | Error |
0.29 | test_persistent_network.py |
| test_02_L2_persistent_network | Error |
1.31 | test_persistent_network.py |
| test_03_deploy_and_destroy_VM_and_verify_network_resources_persist | Failure |
2.62 | test_persistent_network.py |
| test_03_deploy_and_destroy_VM_and_verify_network_resources_persist | Error |
2.62 | test_persistent_network.py |
| ContextSuite context=TestL2PersistentNetworks>:teardown | Error |
2.70 | test_persistent_network.py |
| test_01_create_delete_portforwarding_fornonvpc | Error |
7.73 | test_portforwardingrules.py |
| test_01_add_primary_storage_disabled_host | Error |
0.27 | test_primary_storage.py |
| test_01_primary_storage_nfs | Error |
0.24 | test_primary_storage.py |
| ContextSuite context=TestStorageTags>:setup | Error |
0.41 | test_primary_storage.py |
| test_01_primary_storage_scope_change | Error |
0.13 | test_primary_storage_scope.py |
| test_01_vpc_privategw_acl | Failure |
8.47 | test_privategw_acl.py |
| test_02_vpc_privategw_static_routes | Failure |
8.74 | test_privategw_acl.py |
| test_03_vpc_privategw_restart_vpc_cleanup | Failure |
9.23 | test_privategw_acl.py |
| test_04_rvpc_privategw_static_routes | Failure |
9.30 | test_privategw_acl.py |
| test_09_project_suspend | Error |
2.62 | test_projects.py |
| test_10_project_activation | Error |
2.47 | test_projects.py |
| test_01_purge_expunged_api_vm_start_date | Error |
3.80 | test_purge_expunged_vms.py |
| test_02_purge_expunged_api_vm_end_date | Error |
3.29 | test_purge_expunged_vms.py |
| test_03_purge_expunged_api_vm_start_end_date | Error |
2.08 | test_purge_expunged_vms.py |
| test_04_purge_expunged_api_vm_no_date | Error |
2.19 | test_purge_expunged_vms.py |
| test_05_purge_expunged_vm_service_offering | Error |
1.76 | test_purge_expunged_vms.py |
| test_06_purge_expunged_vm_background_task | Error |
372.09 | test_purge_expunged_vms.py |
| test_CRUD_operations_userdata | Error |
1525.12 | test_register_userdata.py |
| test_deploy_vm_with_registered_userdata | Error |
8.53 | test_register_userdata.py |
| test_deploy_vm_with_registered_userdata_with_override_policy_allow | Error |
8.10 | test_register_userdata.py |
| test_deploy_vm_with_registered_userdata_with_override_policy_append | Error |
8.54 | test_register_userdata.py |
| test_deploy_vm_with_registered_userdata_with_override_policy_deny | Error |
8.05 | test_register_userdata.py |
| test_deploy_vm_with_registered_userdata_with_params | Error |
7.99 | test_register_userdata.py |
| test_link_and_unlink_userdata_to_template | Error |
8.00 | test_register_userdata.py |
| test_user_userdata_crud | Error |
8.70 | test_register_userdata.py |
| ContextSuite context=TestResetVmOnReboot>:setup | Error |
0.00 | test_reset_vm_on_reboot.py |
| ContextSuite context=TestRAMCPUResourceAccounting>:setup | Error |
0.00 | test_resource_accounting.py |
| ContextSuite context=TestResourceNames>:setup | Error |
0.00 | test_resource_names.py |
| ContextSuite context=TestRestoreVM>:setup | Error |
0.00 | test_restore_vm.py |
| ContextSuite context=TestRouterDHCPHosts>:setup | Error |
0.00 | test_router_dhcphosts.py |
| ContextSuite context=TestRouterDHCPOpts>:setup | Error |
0.00 | test_router_dhcphosts.py |
| ContextSuite context=TestRouterDns>:setup | Error |
0.00 | test_router_dns.py |
| ContextSuite context=TestRouterDnsService>:setup | Error |
0.00 | test_router_dnsservice.py |
| ContextSuite context=TestRouterIpTablesPolicies>:setup | Error |
0.00 | test_routers_iptables_default_policy.py |
| ContextSuite context=TestVPCIpTablesPolicies>:setup | Error |
0.00 | test_routers_iptables_default_policy.py |
| ContextSuite context=TestIsolatedNetworks>:setup | Error |
0.00 | test_routers_network_ops.py |
| ContextSuite context=TestRedundantIsolateNetworks>:setup | Error |
0.00 | test_routers_network_ops.py |
| ContextSuite context=TestRouterServices>:setup | Error |
0.00 | test_routers.py |
| test_01_sys_vm_start | Failure |
0.11 | test_secondary_storage.py |
| ContextSuite context=TestCpuCapServiceOfferings>:setup | Error |
0.00 | test_service_offerings.py |
| ContextSuite context=TestServiceOfferings>:setup | Error |
0.33 | test_service_offerings.py |
| ContextSuite context=TestSetSourceNatIp>:setup | Error |
0.00 | test_set_sourcenat.py |
| test_01_migrate_vm_strict_tags_success | Error |
0.34 | test_vm_strict_host_tags.py |
| test_02_migrate_vm_strict_tags_failure | Error |
0.27 | test_vm_strict_host_tags.py |
| test_01_restore_vm_strict_tags_success | Error |
0.29 | test_vm_strict_host_tags.py |
| test_02_restore_vm_strict_tags_failure | Error |
0.28 | test_vm_strict_host_tags.py |
| test_01_scale_vm_strict_tags_success | Error |
0.31 | test_vm_strict_host_tags.py |
| test_02_scale_vm_strict_tags_failure | Error |
0.42 | test_vm_strict_host_tags.py |
| test_01_deploy_vm_on_specific_host_without_strict_tags | Error |
0.30 | test_vm_strict_host_tags.py |
| test_02_deploy_vm_on_any_host_without_strict_tags | Error |
3.00 | test_vm_strict_host_tags.py |
| test_03_deploy_vm_on_specific_host_with_strict_tags_success | Error |
0.33 | test_vm_strict_host_tags.py |
| test_04_deploy_vm_on_any_host_with_strict_tags_success | Error |
5.95 | test_vm_strict_host_tags.py |
| test_05_deploy_vm_on_specific_host_with_strict_tags_failure | Failure |
0.28 | test_vm_strict_host_tags.py |
| ContextSuite context=TestSharedFSLifecycle>:setup | Error |
0.00 | test_sharedfs_lifecycle.py |
| ContextSuite context=TestSnapshotRootDisk>:setup | Error |
0.00 | test_snapshots.py |
| ContextSuite context=TestSnapshotStandaloneBackup>:setup | Error |
0.00 | test_snapshots.py |
| test_01_list_sec_storage_vm | Failure |
0.05 | test_ssvm.py |
| test_02_list_cpvm_vm | Failure |
0.05 | test_ssvm.py |
| test_03_ssvm_internals | Failure |
0.04 | test_ssvm.py |
| test_04_cpvm_internals | Failure |
0.04 | test_ssvm.py |
| test_05_stop_ssvm | Failure |
0.04 | test_ssvm.py |
| test_06_stop_cpvm | Failure |
0.05 | test_ssvm.py |
| test_07_reboot_ssvm | Failure |
0.04 | test_ssvm.py |
| test_08_reboot_cpvm | Failure |
0.04 | test_ssvm.py |
| test_09_reboot_ssvm_forced | Failure |
0.05 | test_ssvm.py |
| test_10_reboot_cpvm_forced | Failure |
0.05 | test_ssvm.py |
| test_11_destroy_ssvm | Failure |
0.05 | test_ssvm.py |
| test_12_destroy_cpvm | Failure |
0.04 | test_ssvm.py |
| ContextSuite context=TestVMWareStoragePolicies>:setup | Error |
0.00 | test_storage_policy.py |
| test_02_create_template_with_checksum_sha1 | Error |
65.77 | test_templates.py |
| test_03_create_template_with_checksum_sha256 | Error |
65.95 | test_templates.py |
| test_04_create_template_with_checksum_md5 | Error |
65.75 | test_templates.py |
| test_05_create_template_with_no_checksum | Error |
65.72 | test_templates.py |
| test_01_register_template_direct_download_flag | Error |
0.09 | test_templates.py |
| test_02_deploy_vm_from_direct_download_template | Error |
0.00 | test_templates.py |
| test_03_deploy_vm_wrong_checksum | Error |
0.07 | test_templates.py |
| ContextSuite context=TestTemplates>:setup | Error |
18.41 | test_templates.py |
| ContextSuite context=TestISOUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestLBRuleUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestNatRuleUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestPublicIPUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestSnapshotUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestVmUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestVolumeUsage>:setup | Error |
0.00 | test_usage.py |
| ContextSuite context=TestVpnUsage>:setup | Error |
0.00 | test_usage.py |
| test_01_scale_up_verify | Failure |
35.04 | test_vm_autoscaling.py |
| test_02_update_vmprofile_and_vmgroup | Failure |
245.70 | test_vm_autoscaling.py |
| test_03_scale_down_verify | Failure |
304.69 | test_vm_autoscaling.py |
| test_04_stop_remove_vm_in_vmgroup | Failure |
0.03 | test_vm_autoscaling.py |
| test_06_autoscaling_vmgroup_on_project_network | Failure |
42.24 | test_vm_autoscaling.py |
| test_06_autoscaling_vmgroup_on_project_network | Error |
42.25 | test_vm_autoscaling.py |
| test_07_autoscaling_vmgroup_on_vpc_network | Error |
1.29 | test_vm_autoscaling.py |
| ContextSuite context=TestVmAutoScaling>:teardown | Error |
14.68 | test_vm_autoscaling.py |
| test_01_deploy_vm_on_specific_host | Error |
0.12 | test_vm_deployment_planner.py |
| test_02_deploy_vm_on_specific_cluster | Error |
1.51 | test_vm_deployment_planner.py |
| test_03_deploy_vm_on_specific_pod | Error |
1.44 | test_vm_deployment_planner.py |
| test_04_deploy_vm_on_host_override_pod_and_cluster | Error |
0.19 | test_vm_deployment_planner.py |
| test_05_deploy_vm_on_cluster_override_pod | Error |
1.45 | test_vm_deployment_planner.py |
| test_01_migrate_VM_and_root_volume | Error |
123.10 | test_vm_life_cycle.py |
| test_02_migrate_VM_with_two_data_disks | Error |
59.69 | test_vm_life_cycle.py |
| test_01_secure_vm_migration | Error |
92.29 | test_vm_life_cycle.py |
| test_02_unsecure_vm_migration | Error |
229.56 | test_vm_life_cycle.py |
| test_04_nonsecured_to_secured_vm_migration | Error |
157.69 | test_vm_life_cycle.py |
| test_08_migrate_vm | Error |
0.07 | test_vm_life_cycle.py |
@blueorangutan test
@vladimirpetrov a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests
[SF] Trillian test result (tid-13659) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 58384 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10995-t13659-kvm-ol8.zip Smoke tests completed. 141 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|