Consider backporting PMU support for Alder Lake
Is your feature request related to a problem? Please describe.
There is no PMU driver for the Intel Alder Lake platform. Hardware event sampling does not work with VTune or perf.
vtune -collect hotspots -knob sampling-mode=hw -knob sampling-interval=0.5 /home/trym/source/stud/tdt4186/practical2/build/release/webserver /tmp 8889 12 24
vtune: Error: Unable to perform driverless collection on this platform.
vtune: Error: Cannot enable event-based sampling collection: Architectural Performance Monitoring version is 0. Make sure the vPMU feature is enabled in your hypervisor.
root@DESKTOP-CMKEO60:~# dmesg | grep -i pmu
[ 0.177428] Performance Events: unsupported p6 CPU model 151 no PMU driver, software events only.
Describe the solution you'd like Backport the driver or provide an alternate solution.
Additional context AFAIK, the only workarounds are to use Hyper-V or dual boot.
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.13-Perf-Alder-Lake
https://lore.kernel.org/lkml/[email protected]/T/
Reportedly a newer kernel alone is not enough: https://github.com/microsoft/WSL/issues/4678#issuecomment-1138625875 Your cpuid report does not list the features either:
Architecture Performance Monitoring Features (0xa):
version ID = 0x0 (0)
number of counters per logical processor = 0x0 (0)
bit width of counter = 0x0 (0)
length of EBX bit vector = 0x0 (0)
core cycle event not available = false
instruction retired event not available = false
reference cycles event not available = false
last-level cache ref event not available = false
last-level cache miss event not avail = false
branch inst retired event not available = false
branch mispred retired event not avail = false
fixed counter 0 supported = false
...
fixed counter 31 supported = false
number of fixed counters = 0x0 (0)
bit width of fixed counters = 0x0 (0)
anythread deprecation = false
@benhillis are you aware of this issue and is there maybe even a fix coming? It's very frustrating not to have PMU support for Alderlake… (Especially since Hyper-V supports it, as @trympet pointed out)
I've hit this today again, since I wanted to profile something under Linux. Why is this not being fixed?!? "Plain" Hyper-V already supports PMU's with Alderlake! I hope you can understand that this is really frustrating as a user 😞...
I'm interested in this feature, as well.
WSL2 has been a great experience for me, but it's a real shame not to be able to get HW perf counters on newer CPUs.
This is also an issue for me as well.
This is also an issue for me as well +1.
I've also asked on Twitter: https://twitter.com/clemenswasser/status/1669265762991714304 Seems like we're just being ghosted 💀, which is extremely disappointing since many require performance counters support and they already work when using Hyper-V... @benhillis @craigloewen-msft ping, are you working on this?
I've once again looked into this and this still hasn't been fixed. In the old issue, I noticed this comment: https://github.com/microsoft/WSL/issues/4678#issuecomment-1142331647 Which seems to have documented the root of the issue pretty well. The problem is that the WSL VM hasn't activated Perfmon (the arch_perfmon feature is missing), which seems to be a hard requirement for newer Intel CPUs for performance counters to work on Linux. Instructions for enabling Perfmon are in the Hyper-V documentation. I could validate this by running the following command:
$ cpuid | grep 'performance monitor'
performance monitor support available = false
performance monitor support available = false
[...]
Sadly, we can't just call Set-VMProcessor MyVMName -Perfmon @("ipt", "pmu", "lbr", "pebs") on the WSL VM, as it seems to be hidden. I only managed to list the vm by running hcsdiag list, but it seems to absent for all hyper-v commands.
@benhillis @craigloewen-msft
Since we now know what is missing, could you please activate all Perfmon features for the WSL VMs so that perf and other software which use performance counters now finally work on newer CPUs?
Any update?
Wow, three years already, impressive! ;) Now that WSL is open source, we get to do the job of Microsoft employees who are paid to work full-time on WSL and debug this ourselves 🥳 (fun, isn’t it?). The relevant code can be found here: https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651 To enable the perfmon features, WSL first checks the CPUID for support for emulation from the host CPU. On my system (i7-12700K, Win11 Pro 27783), these bits are all false, so the perfmon features aren’t activated for the WSL VM:
> .\cpuid.exe
ChildPerfmonPmuSupported: 0
ChildPerfmonLbrSupported: 0
ChildPerfmonIptSupported: 0
It would be great to see data from other systems, if these bits are set for you, please check with the script I attached. The check seems to be incorrect, or at least prone to false negatives. We’ve already confirmed that Hyper-V can actually enable (and virtualize?) perfmon features even when the CPUID bits are off (Alderlake and above). cpuid.c:
// clang -o cpuid.exe cpuid.c
#include <intrin.h>
#include <stdint.h>
#include <stdio.h>
typedef struct _HV_X64_HYPERVISOR_HARDWARE_FEATURES {
//
// Eax
//
uint32_t ApicOverlayAssistInUse : 1;
uint32_t MsrBitmapsInUse : 1;
uint32_t ArchitecturalPerformanceCountersInUse : 1;
uint32_t SecondLevelAddressTranslationInUse : 1;
uint32_t DmaRemappingInUse : 1;
uint32_t InterruptRemappingInUse : 1;
uint32_t MemoryPatrolScrubberPresent : 1;
uint32_t DmaProtectionInUse : 1;
uint32_t HpetRequested : 1;
uint32_t SyntheticTimersVolatile : 1;
uint32_t HypervisorLevel : 4;
uint32_t PhysicalDestinationModeRequired : 1;
uint32_t UseVmfuncForAliasMapSwitch : 1;
uint32_t HvRegisterForMemoryZeroingSupported : 1;
uint32_t UnrestrictedGuestSupported : 1;
uint32_t RdtAFeaturesSupported : 1;
uint32_t RdtMFeaturesSupported : 1;
uint32_t ChildPerfmonPmuSupported : 1;
uint32_t ChildPerfmonLbrSupported : 1;
uint32_t ChildPerfmonIptSupported : 1;
uint32_t ApicEmulationSupported : 1;
uint32_t ChildX2ApicRecommended : 1;
uint32_t HardwareWatchdogReserved : 1;
uint32_t DeviceAccessTrackingSupported : 1;
uint32_t Reserved : 5;
//
// Ebx
//
uint32_t DeviceDomainInputWidth : 8;
uint32_t ReservedEbx : 24;
//
// Ecx
//
uint32_t ReservedEcx;
//
// Edx
//
uint32_t ReservedEdx;
} HV_X64_HYPERVISOR_HARDWARE_FEATURES, *PHV_X64_HYPERVISOR_HARDWARE_FEATURES;
#define HvCpuIdFunctionMsHvHardwareFeatures 0x40000006
int main() {
HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures = {};
__cpuid((int *)(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures);
printf("ChildPerfmonPmuSupported: %u\n",
hardwareFeatures.ChildPerfmonPmuSupported);
printf("ChildPerfmonLbrSupported: %u\n",
hardwareFeatures.ChildPerfmonLbrSupported);
printf("ChildPerfmonIptSupported: %u\n",
hardwareFeatures.ChildPerfmonIptSupported);
printf("ApicOverlayAssistInUse: %u\n",
hardwareFeatures.ApicOverlayAssistInUse);
printf("MsrBitmapsInUse: %u\n", hardwareFeatures.MsrBitmapsInUse);
printf("ArchitecturalPerformanceCountersInUse: %u\n",
hardwareFeatures.ArchitecturalPerformanceCountersInUse);
printf("SecondLevelAddressTranslationInUse: %u\n",
hardwareFeatures.SecondLevelAddressTranslationInUse);
printf("DmaRemappingInUse: %u\n", hardwareFeatures.DmaRemappingInUse);
printf("InterruptRemappingInUse: %u\n",
hardwareFeatures.InterruptRemappingInUse);
printf("MemoryPatrolScrubberPresent: %u\n",
hardwareFeatures.MemoryPatrolScrubberPresent);
printf("DmaProtectionInUse: %u\n", hardwareFeatures.DmaProtectionInUse);
printf("HpetRequested: %u\n", hardwareFeatures.HpetRequested);
printf("SyntheticTimersVolatile: %u\n",
hardwareFeatures.SyntheticTimersVolatile);
printf("HypervisorLevel: %u\n", hardwareFeatures.HypervisorLevel);
printf("PhysicalDestinationModeRequired: %u\n",
hardwareFeatures.PhysicalDestinationModeRequired);
printf("UseVmfuncForAliasMapSwitch: %u\n",
hardwareFeatures.UseVmfuncForAliasMapSwitch);
printf("HvRegisterForMemoryZeroingSupported: %u\n",
hardwareFeatures.HvRegisterForMemoryZeroingSupported);
printf("UnrestrictedGuestSupported: %u\n",
hardwareFeatures.UnrestrictedGuestSupported);
printf("RdtAFeaturesSupported: %u\n", hardwareFeatures.RdtAFeaturesSupported);
printf("RdtMFeaturesSupported: %u\n", hardwareFeatures.RdtMFeaturesSupported);
printf("ApicEmulationSupported: %u\n",
hardwareFeatures.ApicEmulationSupported);
printf("ChildX2ApicRecommended: %u\n",
hardwareFeatures.ChildX2ApicRecommended);
printf("HardwareWatchdogReserved: %u\n",
hardwareFeatures.HardwareWatchdogReserved);
printf("DeviceAccessTrackingSupported: %u\n",
hardwareFeatures.DeviceAccessTrackingSupported);
printf("DeviceDomainInputWidth: %u\n",
hardwareFeatures.DeviceDomainInputWidth);
printf("ReservedEbx: %u\n", hardwareFeatures.ReservedEbx);
printf("ReservedEcx: %u\n", hardwareFeatures.ReservedEcx);
printf("ReservedEdx: %u\n", hardwareFeatures.ReservedEdx);
return 0;
}
does that mean we can use intel pt in wsl ? @clemenswasser
Wow, three years already, impressive! ;) Now that WSL is open source, we get to do the job of Microsoft employees who are paid to work full-time on WSL and debug this ourselves 🥳 (fun, isn’t it?). The relevant code can be found here:
WSL/src/windows/service/exe/WslCoreVm.cpp
Lines 1640 to 1651 in 4b5cb64
#ifdef AMD64
// Enable hardware performance counters if they are supported. if (m_vmConfig.EnableHardwarePerformanceCounters) { HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures{}; __cpuid(reinterpret_cast<int*>(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = hardwareFeatures.ChildPerfmonPmuSupported != 0; vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = hardwareFeatures.ChildPerfmonLbrSupported != 0; }#endif
To enable the perfmon features, WSL first checks the CPUID for support for emulation from the host CPU. On my system (i7-12700K, Win11 Pro 27783), these bits are all false, so the perfmon features aren’t activated for the WSL VM:
.\cpuid.exe ChildPerfmonPmuSupported: 0 ChildPerfmonLbrSupported: 0 ChildPerfmonIptSupported: 0 It would be great to see data from other systems, if these bits are set for you, please check with the script I attached. The check seems to be incorrect, or at least prone to false negatives. We’ve already confirmed that Hyper-V can actually enable (and virtualize?) perfmon features even when the CPUID bits are off (Alderlake and above). cpuid.c:
// clang -o cpuid.exe cpuid.c #include <intrin.h> #include <stdint.h> #include <stdio.h>
typedef struct _HV_X64_HYPERVISOR_HARDWARE_FEATURES { // // Eax // uint32_t ApicOverlayAssistInUse : 1; uint32_t MsrBitmapsInUse : 1; uint32_t ArchitecturalPerformanceCountersInUse : 1; uint32_t SecondLevelAddressTranslationInUse : 1; uint32_t DmaRemappingInUse : 1; uint32_t InterruptRemappingInUse : 1; uint32_t MemoryPatrolScrubberPresent : 1; uint32_t DmaProtectionInUse : 1; uint32_t HpetRequested : 1; uint32_t SyntheticTimersVolatile : 1; uint32_t HypervisorLevel : 4; uint32_t PhysicalDestinationModeRequired : 1; uint32_t UseVmfuncForAliasMapSwitch : 1; uint32_t HvRegisterForMemoryZeroingSupported : 1; uint32_t UnrestrictedGuestSupported : 1; uint32_t RdtAFeaturesSupported : 1; uint32_t RdtMFeaturesSupported : 1; uint32_t ChildPerfmonPmuSupported : 1; uint32_t ChildPerfmonLbrSupported : 1; uint32_t ChildPerfmonIptSupported : 1; uint32_t ApicEmulationSupported : 1; uint32_t ChildX2ApicRecommended : 1; uint32_t HardwareWatchdogReserved : 1; uint32_t DeviceAccessTrackingSupported : 1; uint32_t Reserved : 5;
// // Ebx // uint32_t DeviceDomainInputWidth : 8; uint32_t ReservedEbx : 24;
// // Ecx // uint32_t ReservedEcx;
// // Edx // uint32_t ReservedEdx;
} HV_X64_HYPERVISOR_HARDWARE_FEATURES, *PHV_X64_HYPERVISOR_HARDWARE_FEATURES;
#define HvCpuIdFunctionMsHvHardwareFeatures 0x40000006
int main() { HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures = {}; __cpuid((int *)(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); printf("ChildPerfmonPmuSupported: %u\n", hardwareFeatures.ChildPerfmonPmuSupported); printf("ChildPerfmonLbrSupported: %u\n", hardwareFeatures.ChildPerfmonLbrSupported); printf("ChildPerfmonIptSupported: %u\n", hardwareFeatures.ChildPerfmonIptSupported); printf("ApicOverlayAssistInUse: %u\n", hardwareFeatures.ApicOverlayAssistInUse); printf("MsrBitmapsInUse: %u\n", hardwareFeatures.MsrBitmapsInUse); printf("ArchitecturalPerformanceCountersInUse: %u\n", hardwareFeatures.ArchitecturalPerformanceCountersInUse); printf("SecondLevelAddressTranslationInUse: %u\n", hardwareFeatures.SecondLevelAddressTranslationInUse); printf("DmaRemappingInUse: %u\n", hardwareFeatures.DmaRemappingInUse); printf("InterruptRemappingInUse: %u\n", hardwareFeatures.InterruptRemappingInUse); printf("MemoryPatrolScrubberPresent: %u\n", hardwareFeatures.MemoryPatrolScrubberPresent); printf("DmaProtectionInUse: %u\n", hardwareFeatures.DmaProtectionInUse); printf("HpetRequested: %u\n", hardwareFeatures.HpetRequested); printf("SyntheticTimersVolatile: %u\n", hardwareFeatures.SyntheticTimersVolatile); printf("HypervisorLevel: %u\n", hardwareFeatures.HypervisorLevel); printf("PhysicalDestinationModeRequired: %u\n", hardwareFeatures.PhysicalDestinationModeRequired); printf("UseVmfuncForAliasMapSwitch: %u\n", hardwareFeatures.UseVmfuncForAliasMapSwitch); printf("HvRegisterForMemoryZeroingSupported: %u\n", hardwareFeatures.HvRegisterForMemoryZeroingSupported); printf("UnrestrictedGuestSupported: %u\n", hardwareFeatures.UnrestrictedGuestSupported); printf("RdtAFeaturesSupported: %u\n", hardwareFeatures.RdtAFeaturesSupported); printf("RdtMFeaturesSupported: %u\n", hardwareFeatures.RdtMFeaturesSupported); printf("ApicEmulationSupported: %u\n", hardwareFeatures.ApicEmulationSupported); printf("ChildX2ApicRecommended: %u\n", hardwareFeatures.ChildX2ApicRecommended); printf("HardwareWatchdogReserved: %u\n", hardwareFeatures.HardwareWatchdogReserved); printf("DeviceAccessTrackingSupported: %u\n", hardwareFeatures.DeviceAccessTrackingSupported); printf("DeviceDomainInputWidth: %u\n", hardwareFeatures.DeviceDomainInputWidth); printf("ReservedEbx: %u\n", hardwareFeatures.ReservedEbx); printf("ReservedEcx: %u\n", hardwareFeatures.ReservedEcx); printf("ReservedEdx: %u\n", hardwareFeatures.ReservedEdx); return 0; }
That is my result: PS D:\workspace> .\check.exe ChildPerfmonPmuSupported: 0 ChildPerfmonLbrSupported: 0 ChildPerfmonIptSupported: 0 ApicOverlayAssistInUse: 1 MsrBitmapsInUse: 1 ArchitecturalPerformanceCountersInUse: 1 SecondLevelAddressTranslationInUse: 1 DmaRemappingInUse: 0 InterruptRemappingInUse: 1 MemoryPatrolScrubberPresent: 0 DmaProtectionInUse: 1 HpetRequested: 0 SyntheticTimersVolatile: 0 HypervisorLevel: 0 PhysicalDestinationModeRequired: 0 UseVmfuncForAliasMapSwitch: 0 HvRegisterForMemoryZeroingSupported: 0 UnrestrictedGuestSupported: 1 RdtAFeaturesSupported: 0 RdtMFeaturesSupported: 0 ApicEmulationSupported: 1 ChildX2ApicRecommended: 1 HardwareWatchdogReserved: 0 DeviceAccessTrackingSupported: 0 DeviceDomainInputWidth: 39 ReservedEbx: 0 ReservedEcx: 0 ReservedEdx: 0
and my cpu is 12th Gen Intel(R) Core(TM) i7-12800HX
In fact, the wsl.exe run on Windows host, that means the __cpuid instruction just get the windows hardwareFeatures, however, windows does not use PerfmonPmu or PerfmonLbr yet, definitely is 0.
But wsl hyper-v vm can set it to 1 and use this hardwareFeatures, that is not a problem because hyper-v vm is seprate with the host.
https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651
In fact, the wsl.exe run on Windows host, that means the __cpuid instruction just get the windows hardwareFeatures, however, windows does not use PerfmonPmu or PerfmonLbr yet, definitely is 0.
But wsl hyper-v vm can set it to 1 and use this hardwareFeatures, that is not a problem because hyper-v vm is seprate with the host.
WSL/src/windows/service/exe/WslCoreVm.cpp
Lines 1640 to 1651 in 4b5cb64
#ifdef AMD64
// Enable hardware performance counters if they are supported. if (m_vmConfig.EnableHardwarePerformanceCounters) { HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures{}; __cpuid(reinterpret_cast<int*>(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = hardwareFeatures.ChildPerfmonPmuSupported != 0; vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = hardwareFeatures.ChildPerfmonLbrSupported != 0; }#endif
Yeah, I think you are right, when hyper-v enabled, host windows (root partition) also runs on top of the Hyper-V virtualization layer, although root partition doesn't get these features enabled by default, vm of wsl2 shold still be able to use these features from hypervisor, just like a normal vm of hyper-v does.
We need a better way(or just through configs) to find which features are available on hardware cpu, not vcpu from host os.
https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651
So, for Adler lake, Can we force enable those two options ? by
vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = true;
vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = true;
Yes, I already tried forcing all of these options to be true (I also tested the other EnablePerfmon options from the json schema) but for every single one of them I get the following error when launching my modified WSL:
> wsl
The hypervisor could not perform the operation because an invalid parameter was specified.
Error code: Wsl/Service/CreateInstance/CreateVm/HCS/0xc0350005
I also tried enabling performance monitoring hardware when using Hyper-V, but this fails with a similar error. Maybe this only works on Windows Enterprise/Server or only with special hardware or drivers? (I can remember that performance monitoring hardware with Hyper-V worked on my work laptop with Alderlake, which has Windows Enterprise on it)
@clemenswasser I'm facing same problems, can't get vPMU work in Hyper-V for Alder Lake, someone else reports the same problem.
So the only workaround is dual-boot.
I just checked my work laptop, and there I get the same error when trying to enable PMU for a Hyper-V VM. So this seems to be a general problem with Windows and Alderlake and above regardless of Windows edition (Pro, Enterprise, etc.)
I also tried enabling performance monitoring hardware when using Hyper-V, but this fails with a similar error. Maybe this only works on Windows Enterprise/Server or only with special hardware or drivers? (I can remember that performance monitoring hardware with Hyper-V worked on my work laptop with Alderlake, which has Windows Enterprise on it)
Which commands etc did you use?
The vTune docs mention Disable the Credential Guard and Device Guard on Hyper-V.
Maybe that has an impact?
https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2025-4/profiling-on-hyper-v.html
@Trass3r No, these are already disabled for me. But WSL (via HCS) and Hyper-V error with invalid parameter, when trying to enable any PMU features on Alder Lake.
Indeed Set-VMProcessor VMName -Perfmon @("ipt", "pmu", "lbr", "pebs") does not work in Hyper-V.
Came here after trying and failing to run rr-debugger/rr in a WSL2 guest. Would be glad if support is added!
$ dmesg | grep -F -i 'PMU'
[ 0.164404] Performance Events: unsupported CPU family 6 model 183 no PMU driver, software events only.
[ 3.679313] RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 ms ovfl timer