cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Add a retry mechanism for VM migration

Open benj-n opened this issue 2 years ago • 19 comments

Description

This PR adds a retry mechanism controlled by two settings (number of retries & wait between retries) when migrating a vm. The VM initial deployment process already has such retries which help circumvent hypervisors issues and race conditions when allocating resources.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [X] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

This change is trivial.

Feature/Enhancement Scale

  • [ ] Major
  • [X] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [X] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

benj-n avatar Mar 30 '23 20:03 benj-n

Codecov Report

Attention: Patch coverage is 6.45161% with 29 lines in your changes missing coverage. Please review.

Project coverage is 12.70%. Comparing base (2e6100d) to head (226ea94). Report is 755 commits behind head on main.

:exclamation: Current head 226ea94 differs from pull request most recent head 1fa563e

Please upload reports for the commit 1fa563e to get more accurate results.

Files Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 6.45% 29 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7383      +/-   ##
============================================
- Coverage     14.40%   12.70%   -1.70%     
+ Complexity    10111     8691    -1420     
============================================
  Files          2748     2729      -19     
  Lines        259390   256623    -2767     
  Branches      40381    39997     -384     
============================================
- Hits          37365    32607    -4758     
- Misses       217190   219866    +2676     
+ Partials       4835     4150     -685     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 31 '23 15:03 codecov[bot]

@shwstppr Default value has been changed to respect original behaviour. VM migration is already tested in the vm_lifecycle test suite. Anything else could prevent this from moving forward ?

benj-n avatar Apr 20 '23 04:04 benj-n

@shwstppr Default value has been changed to respect original behaviour. VM migration is already tested in the vm_lifecycle test suite. Anything else could prevent this from moving forward ?

Thanks for addressing the suggestion. I don't think VM migration in vm_lifecycle test suite will be able to test retries. Have you done any testing where migration failed first but succeeds in retries?

shwstppr avatar Apr 20 '23 06:04 shwstppr

@blueorangutan package

shwstppr avatar Apr 21 '23 19:04 shwstppr

@shwstppr a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Apr 21 '23 20:04 blueorangutan

Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: el9 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 5961

blueorangutan avatar Apr 21 '23 20:04 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Jun 22 '23 18:06 github-actions[bot]

@blueorangutan package

DaanHoogland avatar Oct 31 '23 14:10 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Oct 31 '23 14:10 blueorangutan

Packaging result [SF]: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: el9 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 7582

blueorangutan avatar Oct 31 '23 15:10 blueorangutan

@blueorangutan test

DaanHoogland avatar Nov 01 '23 09:11 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Nov 01 '23 09:11 blueorangutan

[SF] Trillian test result (tid-8193) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 45280 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7383-t8193-kvm-centos7.zip Smoke tests completed. 114 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_upgrade_kubernetes_cluster Failure 592.74 test_kubernetes_clusters.py

blueorangutan avatar Nov 01 '23 22:11 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Feb 08 '24 13:02 github-actions[bot]

Hi @benj-n Please check/resolve the conflicts. Can target this PR for 4.19.1?

sureshanaparti avatar Mar 12 '24 10:03 sureshanaparti

Ping @benj-n Can you check/address the comments. Thanks.

sureshanaparti avatar Jun 23 '24 15:06 sureshanaparti

@benj-n , I moved this to the next major release as there is little activity on this. cc @JoaoJandre

I think it is almost done though.

DaanHoogland avatar Sep 18 '24 10:09 DaanHoogland

hi @benj-n please check and resolve any conflicts in the branch.

sureshanaparti avatar Jun 05 '25 10:06 sureshanaparti

@benj-n , will you still be looking at this?

DaanHoogland avatar Oct 10 '25 07:10 DaanHoogland

@benj-n , closing this one as it has conflicts and is old. please update and reopen if you think it is still relevant.

DaanHoogland avatar Dec 12 '25 08:12 DaanHoogland