WSL icon indicating copy to clipboard operation
WSL copied to clipboard

Consider backporting PMU support for Alder Lake

Open trympet opened this issue 3 years ago • 22 comments

Is your feature request related to a problem? Please describe. There is no PMU driver for the Intel Alder Lake platform. Hardware event sampling does not work with VTune or perf.

vtune -collect hotspots -knob sampling-mode=hw -knob sampling-interval=0.5 /home/trym/source/stud/tdt4186/practical2/build/release/webserver /tmp 8889 12 24
vtune: Error: Unable to perform driverless collection on this platform.
vtune: Error: Cannot enable event-based sampling collection: Architectural Performance Monitoring version is 0. Make sure the vPMU feature is enabled in your hypervisor.
root@DESKTOP-CMKEO60:~# dmesg | grep -i pmu
[    0.177428] Performance Events: unsupported p6 CPU model 151 no PMU driver, software events only.

Describe the solution you'd like Backport the driver or provide an alternate solution.

Additional context AFAIK, the only workarounds are to use Hyper-V or dual boot.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.13-Perf-Alder-Lake
https://lore.kernel.org/lkml/[email protected]/T/

cpuid.txt

trympet avatar Mar 16 '22 15:03 trympet

Reportedly a newer kernel alone is not enough: https://github.com/microsoft/WSL/issues/4678#issuecomment-1138625875 Your cpuid report does not list the features either:

   Architecture Performance Monitoring Features (0xa):
      version ID                               = 0x0 (0)
      number of counters per logical processor = 0x0 (0)
      bit width of counter                     = 0x0 (0)
      length of EBX bit vector                 = 0x0 (0)
      core cycle event not available           = false
      instruction retired event not available  = false
      reference cycles event not available     = false
      last-level cache ref event not available = false
      last-level cache miss event not avail    = false
      branch inst retired event not available  = false
      branch mispred retired event not avail   = false
      fixed counter  0 supported               = false
...
      fixed counter 31 supported               = false
      number of fixed counters                 = 0x0 (0)
      bit width of fixed counters              = 0x0 (0)
      anythread deprecation                    = false

Trass3r avatar Jun 09 '22 10:06 Trass3r

@benhillis are you aware of this issue and is there maybe even a fix coming? It's very frustrating not to have PMU support for Alderlake… (Especially since Hyper-V supports it, as @trympet pointed out)

clemenswasser avatar Dec 05 '22 18:12 clemenswasser

I've hit this today again, since I wanted to profile something under Linux. Why is this not being fixed?!? "Plain" Hyper-V already supports PMU's with Alderlake! I hope you can understand that this is really frustrating as a user 😞...

clemenswasser avatar Feb 28 '23 16:02 clemenswasser

I'm interested in this feature, as well.

WSL2 has been a great experience for me, but it's a real shame not to be able to get HW perf counters on newer CPUs.

topolarity avatar Apr 04 '23 21:04 topolarity

This is also an issue for me as well.

tyler274 avatar Apr 19 '23 11:04 tyler274

This is also an issue for me as well +1.

samlihaha avatar Jul 03 '23 11:07 samlihaha

I've also asked on Twitter: https://twitter.com/clemenswasser/status/1669265762991714304 Seems like we're just being ghosted 💀, which is extremely disappointing since many require performance counters support and they already work when using Hyper-V... @benhillis @craigloewen-msft ping, are you working on this?

clemenswasser avatar Jul 03 '23 13:07 clemenswasser

I've once again looked into this and this still hasn't been fixed. In the old issue, I noticed this comment: https://github.com/microsoft/WSL/issues/4678#issuecomment-1142331647 Which seems to have documented the root of the issue pretty well. The problem is that the WSL VM hasn't activated Perfmon (the arch_perfmon feature is missing), which seems to be a hard requirement for newer Intel CPUs for performance counters to work on Linux. Instructions for enabling Perfmon are in the Hyper-V documentation. I could validate this by running the following command:

$ cpuid | grep 'performance monitor'
      performance monitor support available    = false
      performance monitor support available    = false
[...]

Sadly, we can't just call Set-VMProcessor MyVMName -Perfmon @("ipt", "pmu", "lbr", "pebs") on the WSL VM, as it seems to be hidden. I only managed to list the vm by running hcsdiag list, but it seems to absent for all hyper-v commands.

@benhillis @craigloewen-msft Since we now know what is missing, could you please activate all Perfmon features for the WSL VMs so that perf and other software which use performance counters now finally work on newer CPUs?

clemenswasser avatar Jul 24 '24 19:07 clemenswasser

Any update?

espkk avatar Feb 04 '25 21:02 espkk

Wow, three years already, impressive! ;) Now that WSL is open source, we get to do the job of Microsoft employees who are paid to work full-time on WSL and debug this ourselves 🥳 (fun, isn’t it?). The relevant code can be found here: https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651 To enable the perfmon features, WSL first checks the CPUID for support for emulation from the host CPU. On my system (i7-12700K, Win11 Pro 27783), these bits are all false, so the perfmon features aren’t activated for the WSL VM:

> .\cpuid.exe
ChildPerfmonPmuSupported: 0
ChildPerfmonLbrSupported: 0
ChildPerfmonIptSupported: 0

It would be great to see data from other systems, if these bits are set for you, please check with the script I attached. The check seems to be incorrect, or at least prone to false negatives. We’ve already confirmed that Hyper-V can actually enable (and virtualize?) perfmon features even when the CPUID bits are off (Alderlake and above). cpuid.c:

// clang -o cpuid.exe cpuid.c
#include <intrin.h>
#include <stdint.h>
#include <stdio.h>


typedef struct _HV_X64_HYPERVISOR_HARDWARE_FEATURES {
  //
  // Eax
  //
  uint32_t ApicOverlayAssistInUse : 1;
  uint32_t MsrBitmapsInUse : 1;
  uint32_t ArchitecturalPerformanceCountersInUse : 1;
  uint32_t SecondLevelAddressTranslationInUse : 1;
  uint32_t DmaRemappingInUse : 1;
  uint32_t InterruptRemappingInUse : 1;
  uint32_t MemoryPatrolScrubberPresent : 1;
  uint32_t DmaProtectionInUse : 1;
  uint32_t HpetRequested : 1;
  uint32_t SyntheticTimersVolatile : 1;
  uint32_t HypervisorLevel : 4;
  uint32_t PhysicalDestinationModeRequired : 1;
  uint32_t UseVmfuncForAliasMapSwitch : 1;
  uint32_t HvRegisterForMemoryZeroingSupported : 1;
  uint32_t UnrestrictedGuestSupported : 1;
  uint32_t RdtAFeaturesSupported : 1;
  uint32_t RdtMFeaturesSupported : 1;
  uint32_t ChildPerfmonPmuSupported : 1;
  uint32_t ChildPerfmonLbrSupported : 1;
  uint32_t ChildPerfmonIptSupported : 1;
  uint32_t ApicEmulationSupported : 1;
  uint32_t ChildX2ApicRecommended : 1;
  uint32_t HardwareWatchdogReserved : 1;
  uint32_t DeviceAccessTrackingSupported : 1;
  uint32_t Reserved : 5;

  //
  // Ebx
  //
  uint32_t DeviceDomainInputWidth : 8;
  uint32_t ReservedEbx : 24;

  //
  // Ecx
  //
  uint32_t ReservedEcx;

  //
  // Edx
  //
  uint32_t ReservedEdx;

} HV_X64_HYPERVISOR_HARDWARE_FEATURES, *PHV_X64_HYPERVISOR_HARDWARE_FEATURES;

#define HvCpuIdFunctionMsHvHardwareFeatures 0x40000006

int main() {
  HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures = {};
  __cpuid((int *)(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures);
  printf("ChildPerfmonPmuSupported: %u\n",
         hardwareFeatures.ChildPerfmonPmuSupported);
  printf("ChildPerfmonLbrSupported: %u\n",
         hardwareFeatures.ChildPerfmonLbrSupported);
  printf("ChildPerfmonIptSupported: %u\n",
         hardwareFeatures.ChildPerfmonIptSupported);
  printf("ApicOverlayAssistInUse: %u\n",
         hardwareFeatures.ApicOverlayAssistInUse);
  printf("MsrBitmapsInUse: %u\n", hardwareFeatures.MsrBitmapsInUse);
  printf("ArchitecturalPerformanceCountersInUse: %u\n",
         hardwareFeatures.ArchitecturalPerformanceCountersInUse);
  printf("SecondLevelAddressTranslationInUse: %u\n",
         hardwareFeatures.SecondLevelAddressTranslationInUse);
  printf("DmaRemappingInUse: %u\n", hardwareFeatures.DmaRemappingInUse);
  printf("InterruptRemappingInUse: %u\n",
         hardwareFeatures.InterruptRemappingInUse);
  printf("MemoryPatrolScrubberPresent: %u\n",
         hardwareFeatures.MemoryPatrolScrubberPresent);
  printf("DmaProtectionInUse: %u\n", hardwareFeatures.DmaProtectionInUse);
  printf("HpetRequested: %u\n", hardwareFeatures.HpetRequested);
  printf("SyntheticTimersVolatile: %u\n",
         hardwareFeatures.SyntheticTimersVolatile);
  printf("HypervisorLevel: %u\n", hardwareFeatures.HypervisorLevel);
  printf("PhysicalDestinationModeRequired: %u\n",
         hardwareFeatures.PhysicalDestinationModeRequired);
  printf("UseVmfuncForAliasMapSwitch: %u\n",
         hardwareFeatures.UseVmfuncForAliasMapSwitch);
  printf("HvRegisterForMemoryZeroingSupported: %u\n",
         hardwareFeatures.HvRegisterForMemoryZeroingSupported);
  printf("UnrestrictedGuestSupported: %u\n",
         hardwareFeatures.UnrestrictedGuestSupported);
  printf("RdtAFeaturesSupported: %u\n", hardwareFeatures.RdtAFeaturesSupported);
  printf("RdtMFeaturesSupported: %u\n", hardwareFeatures.RdtMFeaturesSupported);
  printf("ApicEmulationSupported: %u\n",
         hardwareFeatures.ApicEmulationSupported);
  printf("ChildX2ApicRecommended: %u\n",
         hardwareFeatures.ChildX2ApicRecommended);
  printf("HardwareWatchdogReserved: %u\n",
         hardwareFeatures.HardwareWatchdogReserved);
  printf("DeviceAccessTrackingSupported: %u\n",
         hardwareFeatures.DeviceAccessTrackingSupported);
  printf("DeviceDomainInputWidth: %u\n",
         hardwareFeatures.DeviceDomainInputWidth);
  printf("ReservedEbx: %u\n", hardwareFeatures.ReservedEbx);
  printf("ReservedEcx: %u\n", hardwareFeatures.ReservedEcx);
  printf("ReservedEdx: %u\n", hardwareFeatures.ReservedEdx);
  return 0;
}

clemenswasser avatar May 19 '25 19:05 clemenswasser

does that mean we can use intel pt in wsl ? @clemenswasser

samlihaha avatar May 20 '25 05:05 samlihaha

Wow, three years already, impressive! ;) Now that WSL is open source, we get to do the job of Microsoft employees who are paid to work full-time on WSL and debug this ourselves 🥳 (fun, isn’t it?). The relevant code can be found here:

WSL/src/windows/service/exe/WslCoreVm.cpp

Lines 1640 to 1651 in 4b5cb64

#ifdef AMD64

 // Enable hardware performance counters if they are supported. 
 if (m_vmConfig.EnableHardwarePerformanceCounters) 
 { 
     HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures{}; 
     __cpuid(reinterpret_cast<int*>(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); 
     vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = hardwareFeatures.ChildPerfmonPmuSupported != 0; 
     vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = hardwareFeatures.ChildPerfmonLbrSupported != 0; 
 } 

#endif

To enable the perfmon features, WSL first checks the CPUID for support for emulation from the host CPU. On my system (i7-12700K, Win11 Pro 27783), these bits are all false, so the perfmon features aren’t activated for the WSL VM:

.\cpuid.exe ChildPerfmonPmuSupported: 0 ChildPerfmonLbrSupported: 0 ChildPerfmonIptSupported: 0 It would be great to see data from other systems, if these bits are set for you, please check with the script I attached. The check seems to be incorrect, or at least prone to false negatives. We’ve already confirmed that Hyper-V can actually enable (and virtualize?) perfmon features even when the CPUID bits are off (Alderlake and above). cpuid.c:

// clang -o cpuid.exe cpuid.c #include <intrin.h> #include <stdint.h> #include <stdio.h>

typedef struct _HV_X64_HYPERVISOR_HARDWARE_FEATURES { // // Eax // uint32_t ApicOverlayAssistInUse : 1; uint32_t MsrBitmapsInUse : 1; uint32_t ArchitecturalPerformanceCountersInUse : 1; uint32_t SecondLevelAddressTranslationInUse : 1; uint32_t DmaRemappingInUse : 1; uint32_t InterruptRemappingInUse : 1; uint32_t MemoryPatrolScrubberPresent : 1; uint32_t DmaProtectionInUse : 1; uint32_t HpetRequested : 1; uint32_t SyntheticTimersVolatile : 1; uint32_t HypervisorLevel : 4; uint32_t PhysicalDestinationModeRequired : 1; uint32_t UseVmfuncForAliasMapSwitch : 1; uint32_t HvRegisterForMemoryZeroingSupported : 1; uint32_t UnrestrictedGuestSupported : 1; uint32_t RdtAFeaturesSupported : 1; uint32_t RdtMFeaturesSupported : 1; uint32_t ChildPerfmonPmuSupported : 1; uint32_t ChildPerfmonLbrSupported : 1; uint32_t ChildPerfmonIptSupported : 1; uint32_t ApicEmulationSupported : 1; uint32_t ChildX2ApicRecommended : 1; uint32_t HardwareWatchdogReserved : 1; uint32_t DeviceAccessTrackingSupported : 1; uint32_t Reserved : 5;

// // Ebx // uint32_t DeviceDomainInputWidth : 8; uint32_t ReservedEbx : 24;

// // Ecx // uint32_t ReservedEcx;

// // Edx // uint32_t ReservedEdx;

} HV_X64_HYPERVISOR_HARDWARE_FEATURES, *PHV_X64_HYPERVISOR_HARDWARE_FEATURES;

#define HvCpuIdFunctionMsHvHardwareFeatures 0x40000006

int main() { HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures = {}; __cpuid((int *)(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); printf("ChildPerfmonPmuSupported: %u\n", hardwareFeatures.ChildPerfmonPmuSupported); printf("ChildPerfmonLbrSupported: %u\n", hardwareFeatures.ChildPerfmonLbrSupported); printf("ChildPerfmonIptSupported: %u\n", hardwareFeatures.ChildPerfmonIptSupported); printf("ApicOverlayAssistInUse: %u\n", hardwareFeatures.ApicOverlayAssistInUse); printf("MsrBitmapsInUse: %u\n", hardwareFeatures.MsrBitmapsInUse); printf("ArchitecturalPerformanceCountersInUse: %u\n", hardwareFeatures.ArchitecturalPerformanceCountersInUse); printf("SecondLevelAddressTranslationInUse: %u\n", hardwareFeatures.SecondLevelAddressTranslationInUse); printf("DmaRemappingInUse: %u\n", hardwareFeatures.DmaRemappingInUse); printf("InterruptRemappingInUse: %u\n", hardwareFeatures.InterruptRemappingInUse); printf("MemoryPatrolScrubberPresent: %u\n", hardwareFeatures.MemoryPatrolScrubberPresent); printf("DmaProtectionInUse: %u\n", hardwareFeatures.DmaProtectionInUse); printf("HpetRequested: %u\n", hardwareFeatures.HpetRequested); printf("SyntheticTimersVolatile: %u\n", hardwareFeatures.SyntheticTimersVolatile); printf("HypervisorLevel: %u\n", hardwareFeatures.HypervisorLevel); printf("PhysicalDestinationModeRequired: %u\n", hardwareFeatures.PhysicalDestinationModeRequired); printf("UseVmfuncForAliasMapSwitch: %u\n", hardwareFeatures.UseVmfuncForAliasMapSwitch); printf("HvRegisterForMemoryZeroingSupported: %u\n", hardwareFeatures.HvRegisterForMemoryZeroingSupported); printf("UnrestrictedGuestSupported: %u\n", hardwareFeatures.UnrestrictedGuestSupported); printf("RdtAFeaturesSupported: %u\n", hardwareFeatures.RdtAFeaturesSupported); printf("RdtMFeaturesSupported: %u\n", hardwareFeatures.RdtMFeaturesSupported); printf("ApicEmulationSupported: %u\n", hardwareFeatures.ApicEmulationSupported); printf("ChildX2ApicRecommended: %u\n", hardwareFeatures.ChildX2ApicRecommended); printf("HardwareWatchdogReserved: %u\n", hardwareFeatures.HardwareWatchdogReserved); printf("DeviceAccessTrackingSupported: %u\n", hardwareFeatures.DeviceAccessTrackingSupported); printf("DeviceDomainInputWidth: %u\n", hardwareFeatures.DeviceDomainInputWidth); printf("ReservedEbx: %u\n", hardwareFeatures.ReservedEbx); printf("ReservedEcx: %u\n", hardwareFeatures.ReservedEcx); printf("ReservedEdx: %u\n", hardwareFeatures.ReservedEdx); return 0; }

That is my result: PS D:\workspace> .\check.exe ChildPerfmonPmuSupported: 0 ChildPerfmonLbrSupported: 0 ChildPerfmonIptSupported: 0 ApicOverlayAssistInUse: 1 MsrBitmapsInUse: 1 ArchitecturalPerformanceCountersInUse: 1 SecondLevelAddressTranslationInUse: 1 DmaRemappingInUse: 0 InterruptRemappingInUse: 1 MemoryPatrolScrubberPresent: 0 DmaProtectionInUse: 1 HpetRequested: 0 SyntheticTimersVolatile: 0 HypervisorLevel: 0 PhysicalDestinationModeRequired: 0 UseVmfuncForAliasMapSwitch: 0 HvRegisterForMemoryZeroingSupported: 0 UnrestrictedGuestSupported: 1 RdtAFeaturesSupported: 0 RdtMFeaturesSupported: 0 ApicEmulationSupported: 1 ChildX2ApicRecommended: 1 HardwareWatchdogReserved: 0 DeviceAccessTrackingSupported: 0 DeviceDomainInputWidth: 39 ReservedEbx: 0 ReservedEcx: 0 ReservedEdx: 0

and my cpu is 12th Gen Intel(R) Core(TM) i7-12800HX

samlihaha avatar May 20 '25 06:05 samlihaha

In fact, the wsl.exe run on Windows host, that means the __cpuid instruction just get the windows hardwareFeatures, however, windows does not use PerfmonPmu or PerfmonLbr yet, definitely is 0.

But wsl hyper-v vm can set it to 1 and use this hardwareFeatures, that is not a problem because hyper-v vm is seprate with the host.

https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651

samlihaha avatar May 20 '25 08:05 samlihaha

In fact, the wsl.exe run on Windows host, that means the __cpuid instruction just get the windows hardwareFeatures, however, windows does not use PerfmonPmu or PerfmonLbr yet, definitely is 0.

But wsl hyper-v vm can set it to 1 and use this hardwareFeatures, that is not a problem because hyper-v vm is seprate with the host.

WSL/src/windows/service/exe/WslCoreVm.cpp

Lines 1640 to 1651 in 4b5cb64

#ifdef AMD64

 // Enable hardware performance counters if they are supported. 
 if (m_vmConfig.EnableHardwarePerformanceCounters) 
 { 
     HV_X64_HYPERVISOR_HARDWARE_FEATURES hardwareFeatures{}; 
     __cpuid(reinterpret_cast<int*>(&hardwareFeatures), HvCpuIdFunctionMsHvHardwareFeatures); 
     vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = hardwareFeatures.ChildPerfmonPmuSupported != 0; 
     vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = hardwareFeatures.ChildPerfmonLbrSupported != 0; 
 } 

#endif

Yeah, I think you are right, when hyper-v enabled, host windows (root partition) also runs on top of the Hyper-V virtualization layer, although root partition doesn't get these features enabled by default, vm of wsl2 shold still be able to use these features from hypervisor, just like a normal vm of hyper-v does.

We need a better way(or just through configs) to find which features are available on hardware cpu, not vcpu from host os.

lwintermelon avatar May 25 '25 10:05 lwintermelon

https://github.com/microsoft/WSL/blob/4b5cb64e795d2f7625e7c296eb100f9c1f75b7ab/src/windows/service/exe/WslCoreVm.cpp#L1640-L1651

So, for Adler lake, Can we force enable those two options ? by

vmSettings.ComputeTopology.Processor.EnablePerfmonPmu = true; 
vmSettings.ComputeTopology.Processor.EnablePerfmonLbr = true; 

foocoder avatar May 28 '25 14:05 foocoder

Yes, I already tried forcing all of these options to be true (I also tested the other EnablePerfmon options from the json schema) but for every single one of them I get the following error when launching my modified WSL:

> wsl
The hypervisor could not perform the operation because an invalid parameter was specified.
Error code: Wsl/Service/CreateInstance/CreateVm/HCS/0xc0350005

I also tried enabling performance monitoring hardware when using Hyper-V, but this fails with a similar error. Maybe this only works on Windows Enterprise/Server or only with special hardware or drivers? (I can remember that performance monitoring hardware with Hyper-V worked on my work laptop with Alderlake, which has Windows Enterprise on it)

clemenswasser avatar Jun 24 '25 14:06 clemenswasser

@clemenswasser I'm facing same problems, can't get vPMU work in Hyper-V for Alder Lake, someone else reports the same problem.

So the only workaround is dual-boot.

lwintermelon avatar Jun 24 '25 14:06 lwintermelon

I just checked my work laptop, and there I get the same error when trying to enable PMU for a Hyper-V VM. So this seems to be a general problem with Windows and Alderlake and above regardless of Windows edition (Pro, Enterprise, etc.)

clemenswasser avatar Jun 24 '25 14:06 clemenswasser

I also tried enabling performance monitoring hardware when using Hyper-V, but this fails with a similar error. Maybe this only works on Windows Enterprise/Server or only with special hardware or drivers? (I can remember that performance monitoring hardware with Hyper-V worked on my work laptop with Alderlake, which has Windows Enterprise on it)

Which commands etc did you use?

The vTune docs mention Disable the Credential Guard and Device Guard on Hyper-V. Maybe that has an impact? https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2025-4/profiling-on-hyper-v.html

Trass3r avatar Jul 12 '25 12:07 Trass3r

@Trass3r No, these are already disabled for me. But WSL (via HCS) and Hyper-V error with invalid parameter, when trying to enable any PMU features on Alder Lake.

Image

clemenswasser avatar Jul 13 '25 08:07 clemenswasser

Indeed Set-VMProcessor VMName -Perfmon @("ipt", "pmu", "lbr", "pebs") does not work in Hyper-V.

Trass3r avatar Jul 13 '25 12:07 Trass3r

Came here after trying and failing to run rr-debugger/rr in a WSL2 guest. Would be glad if support is added!

$ dmesg | grep -F -i 'PMU'
[    0.164404] Performance Events: unsupported CPU family 6 model 183 no PMU driver, software events only.
[    3.679313] RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 ms ovfl timer

CSharperMantle avatar Nov 07 '25 14:11 CSharperMantle