cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Fix VGPU available devices listing

Open nvazquez opened this issue 1 year ago • 16 comments

Description

This PR fixes the VGPU deployment issue on a SQL statement:

2024-08-22T19:27:18,113 ERROR [c.c.g.d.HostGpuGroupsDaoImpl] (API-Job-Executor-25:[ctx-3747d8e0, job-111, ctx-4f10d82e, FirstFitRoutingAllocator]) (logid:e72eacab) DB Exception on: com.mysql.cj.jdbc.ClientPreparedStatement: SELECT host_gpu_groups.id, host_gpu_groups.group_name, host_gpu_groups.host_id FROM host_gpu_groups  INNER JOIN vgpu_types groupId ON host_gpu_groups.id=groupId.gpu_group_id WHERE host_gpu_groups.host_id = 4  AND host_gpu_groups.group_name = x'47726f7570206f66204e564944494120436f72706f726174696f6e204756313030474c205b5445534c4120563130305d2047505573'  AND  (groupId.vgpu_type = x'706173737468726f756768'  AND groupId.remaining_capacity > 0 ) ORDER BY vgpu_types.remaining_capacity DESC java.sql.SQLSyntaxErrorException: Unknown column 'vgpu_types.remaining_capacity' in 'order clause'
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:121)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)

Fixes: #9483

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI
  • [ ] test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Previously I was able to hit the SQL issue on deployment even on non GPU enabled hosts After the fix, the error is passed and hit no capacity exception (valid in my environment)

How did you try to break this feature and the system with this change?

nvazquez avatar Aug 22 '24 20:08 nvazquez

@blueorangutan package

nvazquez avatar Aug 22 '24 20:08 nvazquez

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Aug 22 '24 20:08 blueorangutan

Codecov Report

Attention: Patch coverage is 16.66667% with 10 lines in your changes missing coverage. Please review.

Project coverage is 15.08%. Comparing base (1a403f1) to head (04e849c). Report is 11 commits behind head on 4.19.

Files with missing lines Patch % Lines
...rk/db/src/main/java/com/cloud/utils/db/Filter.java 20.00% 6 Missing and 2 partials :warning:
...n/java/com/cloud/resource/ResourceManagerImpl.java 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##               4.19    #9573    +/-   ##
==========================================
  Coverage     15.08%   15.08%            
- Complexity    11181    11188     +7     
==========================================
  Files          5406     5406            
  Lines        472915   473070   +155     
  Branches      58400    58455    +55     
==========================================
+ Hits          71336    71379    +43     
- Misses       393637   393746   +109     
- Partials       7942     7945     +3     
Flag Coverage Δ
uitests 4.30% <ø> (ø)
unittests 15.80% <16.66%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 22 '24 20:08 codecov[bot]

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10765

blueorangutan avatar Aug 22 '24 21:08 blueorangutan

@blueorangutan test

nvazquez avatar Aug 22 '24 21:08 nvazquez

@nvazquez a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Aug 22 '24 21:08 blueorangutan

[SF] Trillian test result (tid-11148) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 48753 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9573-t11148-kvm-ol8.zip Smoke tests completed. 131 look OK, 2 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_list_cpvm_vm Failure 0.05 test_ssvm.py
test_04_cpvm_internals Failure 0.06 test_ssvm.py
test_02_unsecure_vm_migration Error 452.72 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 375.02 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 402.14 test_vm_life_cycle.py

blueorangutan avatar Aug 23 '24 11:08 blueorangutan

Thanks @vishesh92 @DaanHoogland - comment addressed

@blueorangutan package

nvazquez avatar Aug 26 '24 10:08 nvazquez

@nvazquez a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Aug 26 '24 11:08 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10809

blueorangutan avatar Aug 26 '24 11:08 blueorangutan

@blueorangutan LLtest

DaanHoogland avatar Aug 26 '24 12:08 DaanHoogland

@DaanHoogland a [LL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Aug 26 '24 12:08 blueorangutan

@vishesh92 does it lgty?

DaanHoogland avatar Aug 28 '24 06:08 DaanHoogland

user tested and approves : https://github.com/apache/cloudstack/issues/9483#issuecomment-2317704299

DaanHoogland avatar Aug 29 '24 14:08 DaanHoogland

@blueorangutan test

DaanHoogland avatar Aug 29 '24 14:08 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Aug 29 '24 14:08 blueorangutan

[SF] Trillian test result (tid-11238) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 43305 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9573-t11238-kvm-ol8.zip Smoke tests completed. 133 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Aug 30 '24 02:08 blueorangutan