kernel icon indicating copy to clipboard operation
kernel copied to clipboard

[5.15-velinux] Backported KVM phase 2 patches for velinux-5.15 kernel

Open PvsNarasimha opened this issue 9 months ago • 0 comments

Hardware Feature Enablement / Support

These patches add support for new CPU features, CPUID leaves, and hardware capabilities on Intel and AMD platforms.

  • x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
  • KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022
  • KVM: x86/svm/pmu: Add AMD PerfMonV2 support
  • x86/cpu: Support AMD Automatic IBRS
  • x86/cpu, kvm: Add the Null Selector Clears Base feature
  • KVM: x86: add support for CPUID leaf 0x80000021
  • KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  • perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
  • KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
  • KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  • KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests

Performance Improvements

Optimizations to reduce overhead and improve runtime efficiency:

  • KVM: x86/pmu: Rewrite reprogram_counters() to improve performance
  • KVM: x86: Use static calls to reduce kvm_pmu_ops overhead
  • KVM: x86: Copy kvm_pmu_ops by value to eliminate layer of indirection
  • KVM: x86/pmu: Use binary search to check filtered events
  • KVM: x86/pmu: Avoid using PEBS perf_events for normal counters

Refactoring / Code Reorganization

These changes focus on improving maintainability, readability, and structure of the KVM/x86 codebase:

  • KVM: x86/cpuid: Refactor host/guest CPU model consistency check
  • KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
  • KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers
  • KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
  • KVM: nVMX: Refactor PMU refresh to avoid referencing kvm_x86_ops.pmu_ops
  • KVM: x86: Move guts of kvm_arch_init() to standalone helper
  • KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86
  • Move various helpers (e.g., pmc_perf_hw_id())
  • KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooks

Feature Enhancements (vPMU / CPUID / MSRs / etc.)

These provide new capabilities, configurable options, and fine-tuned control to user space or the guest:

  • KVM: x86: Provide per VM capability for disabling PMU virtualization
  • KVM: x86/svm: Add module param to control PMU virtualization
  • KVM: x86/pmu: Restrict advanced features based on module enable_pmu
  • KVM: x86/pmu: Advertise PERFCTR_CORE iff the min nr of counters is met
  • KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
  • KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
  • KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
  • KVM: x86: Move lookup of indexed CPUID leafs to helper

User/Guest Safety / Fault Isolation

These prevent invalid or insecure guest/host interactions:

  • KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled
  • KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits
  • KVM: x86/pmu: Prevent the PMU from counting disallowed events
  • KVM: x86/pmu: Limit the maximum number of supported AMD GP counters
  • KVM: x86/pmu: Limit the maximum number of supported Intel GP counters
  • KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run

Bug Fixes

These patches address correctness issues, warnings, and reliability concerns in KVM and vPMU:

  • KVM: x86: Fix errant brace in KVM capability handling
  • KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
  • KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
  • KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
  • KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls
  • KVM: x86: Fix clang -Wimplicit-fallthrough in do_host_cpuid()
  • KVM: x86: avoid out of bounds indices for fixed performance counters
  • KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms
  • kvm: x86/pmu: Fix the compare function used by the pmu event filter

Documentation or Cleanup / Misc

Cleanup and changes that make the codebase more developer-friendly:

  • docs: kvm: x86: Fix broken field list
  • KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask
  • KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()
  • KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names
  • KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks
  • KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime()
  • KVM: x86: use static_call_cond for optional callbacks

Run Test cases

$ git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
$ cd kvm-unit-tests/
$ ./configure
$ make
$ ./run_tests.sh
root@volcano9b5e-os:/home/amd/Linux_Backport/kvm-unit-tests# ./run_tests.sh
PASS apic-split (56 tests)
PASS ioapic-split (19 tests)
PASS x2apic (56 tests)
FAIL xapic (timeout; duration=60)
PASS ioapic (26 tests)
SKIP cmpxchg8b (i386 only)
PASS smptest (1 tests)
PASS smptest3 (1 tests)
PASS vmexit_cpuid
PASS vmexit_vmcall
PASS vmexit_mov_from_cr8
PASS vmexit_mov_to_cr8
PASS vmexit_inl_pmtimer
PASS vmexit_ipi
PASS vmexit_ipi_halt
PASS vmexit_ple_round_robin
PASS vmexit_tscdeadline
PASS vmexit_tscdeadline_immed
PASS vmexit_cr0_wp
PASS vmexit_cr4_pge
PASS access (2 tests)
SKIP access_fep (test marked as manual run only)
SKIP access-reduced-maxphyaddr (/sys/module/kvm_intel/parameters/allow_smaller_maxphyaddr not equal to Y)
PASS smap (18 tests)
PASS pku (7 tests)
SKIP pks (0 tests)
PASS asyncpf (2 tests, 1 skipped)
PASS emulator (140 tests, 2 skipped)
PASS eventinj (13 tests)
PASS hypercall (2 tests)
PASS idt_test (4 tests)
PASS memory (7 tests, 1 skipped)
PASS msr (1836 tests)
SKIP pmu (/proc/sys/kernel/nmi_watchdog not equal to 0)
SKIP pmu_lbr (/proc/sys/kernel/nmi_watchdog not equal to 0)
SKIP pmu_pebs (/proc/sys/kernel/nmi_watchdog not equal to 0)
SKIP vmware_backdoors (/sys/module/kvm/parameters/enable_vmware_backdoor not equal to Y)
PASS realmode
PASS s3
PASS setjmp (10 tests)
PASS sieve
PASS syscall (2 tests)
PASS tsc (6 tests)
PASS tsc_adjust (6 tests)
PASS xsave (17 tests)
PASS rmap_chain
FAIL svm
SKIP svm_pause_filter (1 tests, 1 skipped)
PASS svm_npt (103 tests)
SKIP taskswitch (i386 only)
SKIP taskswitch2 (i386 only)
PASS kvmclock_test
PASS pcid-enabled (2 tests)
PASS pcid-disabled (2 tests)
PASS pcid-asymmetric (2 tests)
PASS rdpru (1 tests)
PASS umip (21 tests)
SKIP la57 (i386 only)
SKIP vmx (0 tests)
SKIP ept (0 tests)
SKIP vmx_eoi_bitmap_ioapic_scan (0 tests)
SKIP vmx_hlt_with_rvi_test (0 tests)
SKIP vmx_apicv_test (0 tests)
SKIP vmx_posted_intr_test (0 tests)
SKIP vmx_apic_passthrough_thread (0 tests)
SKIP vmx_init_signal_test (0 tests)
SKIP vmx_sipi_signal_test (0 tests)
SKIP vmx_apic_passthrough_tpr_threshold_test (0 tests)
SKIP vmx_vmcs_shadow_test (0 tests)
SKIP vmx_pf_exception_test (0 tests)
SKIP vmx_pf_exception_test_fep (test marked as manual run only)
SKIP vmx_pf_vpid_test (test marked as manual run only)
SKIP vmx_pf_invvpid_test (test marked as manual run only)
SKIP vmx_pf_no_vpid_test (test marked as manual run only)
SKIP vmx_pf_exception_test_reduced_maxphyaddr (/sys/module/kvm_intel/parameters/allow_smaller_maxphyaddr not equal to Y)
PASS debug (23 tests)
PASS hyperv_synic (1 tests)
PASS hyperv_connections (7 tests)
PASS hyperv_stimer (12 tests)
PASS hyperv_stimer_direct (8 tests)
PASS hyperv_clock (3 tests)
PASS intel_iommu (11 tests)
SKIP tsx-ctrl (1 tests, 1 skipped)
SKIP intel_cet (0 tests)
  • Verified via debug logs
root@volcano9dee-host:/home/amd# dmesg | grep Malathi 
[  29.605656] amd_pmu_vs_enable_all is called : Malathi 
[  29.607491] amd_pmu_ _v2_disable_all is called : Malathi
[  29.618834] amd_pmu_vs_disable_all is called : Malathi 
[  29.619491] amd_pmu_v2_enable_all is called : Malathi 
[  29.619530] amd_pmu_vs_disable_all is called : Malathi 
[  29.620491] amd_pmu_v2_enable_all is called : Malathi 
[  29.620493] amd_pmu_vs_disable_all is called : Malathi 
[  29.621491] amd_pmu_v2_enable_all is called : Malathi 
[  29.621491] amd_pmu_vs_disable_all is called : Malathi 
[  29.621491] amd_pmu_v2_enable_all is called : Malathi 
[  29.624441] amd_pmu_vs_disable_all is called : Malathi
  • Validated PMU with perf
root@volcano9dee-host:/home/amd/upstream_work/original_kernel_velinux/kernel/tools/perf# ./perf stat -e cycles,instructions a.out sleep 3
Hello,World
 
Performance counter stats for 'a.out sleep 3':
 
          5,74,208      cycles
          6,08,549      instructions              #    1.06  insn per cycle
 
       0.008310204 seconds time elapsed
 
       0.000000000 seconds user
       0.000797000 seconds sys
root@volcano9dee-host:/home/amd/upstream_work/original_kernel_velinux/kernel/tools/perf# ./perf record
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.273 MB perf.data (1560 samples) ]
./perf report | grep pmu
     0.11%  swapper          [kernel.kallsyms]         [k] x86_pmu_disable
     0.02%  perf             [kernel.kallsyms]         [k] amd_pmu_v2_enable_all
  • Booted VM with QEMU
vi vmc_script.sh //Copy the below script

#!/bin/bash

qemu-system-x86_64 \
 -m 2048 \
 -enable-kvm \
 -cpu host \
 -smp 2 \
 -hda /vms/images/velinux_chaithu.qcow2 \
 -cdrom /vms/iso/velinux-2.1-amd64-DVD-1.iso \
 -boot d \
 -netdev user,id=net0,hostfwd=tcp::2222-:22 \
 -device virtio-net-pci,netdev=net0 \
 -vnc :0

Run the script to create a VM

chmod +x vmc_script.sh
./vmc_script.sh

PvsNarasimha avatar May 09 '25 09:05 PvsNarasimha