Docs for config, matching, etc
Hi! Great tool, but it took me a while to understand how to match my sensors that are not covered by default config. I ended up rewriting it from scratch for easier discovery and querying, and made a HOWTO to help understand how it works. Hope it helps somebody, that's why i'm putting it into issues.
metadata:
description: HWiNFO to Prometheus Metric Mapping data
README: |
hwinfo-prometheus mapping HOWTO
mapping is a list of C# regular expressions with named capture groups.
each regex is matched against every sensor name retrieved from hwinfo.
sensor name is a string that is visible in hwinfo UI.
if regex matches, named groups get extracted and transformed into prometheus tags.
other sensor data is extracted into tags too. also, you can set tags manually. see details below.
prometheus metric name is constructed from:
* "hwi_" prefix
* value of regex group "Entity" as another prefix
* value of regex group "MetricName", or, if absent, sensor name from hwinfo
* sensor unit suffix (mhz, v, c, etc)
for example, "hwi_ram_used_mb"
prometheus tags:
* "sensor" from sensor name (the one used for regex matching)
* "unit" from sensor unit, eg. "MHz", "%"
* "sensor_type" from sensor category, eg. "SENSOR_TYPE_POWER"
* "source" from sensor source - folder name in hwinfo UI, eg. "CPU [#0]: AMD Ryzen 9 7945HX"
* "category" from regex group "MetricCategory"
* other named groups are converted to tags
special regex groups:
* "Entity" - metric name prefix, after "hwi_", empty by default
* "MetricName" - its value goes to metric name. if empty, original sensor name string is used
* "MetricCategory" - the only group that gets renamed to "category" tag for some reason
manually set values for tags:
if you can't match certain text in sensor name, but need to override metric name, pefix, or set custom tag value,
you can add an empty group with name in special format like this:
"(?<MyTag_MyValue>)", eg. "(?<Entity_custom>)" or "(?<MetricName_test123>)".
the group will have this value as a result. Underscores are not supported in values!
for example, "(?<Entity>GPU) (?<kind>.+) (?<MetricName>Usage)"
will match sensors:
* "GPU Computing (Cuda) Usage"
* "GPU D3D Usage"
and will result in prometheus metrics:
* hwi_gpu_usage{kind="Computing (Cuda)",unit="%",sensor_type="SENSOR_TYPE_USAGE",sensor="GPU Computing (Cuda) Usage",source="GPU [#0]: NVIDIA GeForce RTX 4090 Laptop",host="AMARU"} 0
* hwi_gpu_usage{kind="D3D",unit="%",sensor_type="SENSOR_TYPE_USAGE",sensor="GPU D3D Usage",source="GPU [#0]: NVIDIA GeForce RTX 4090 Laptop",host="AMARU"} 4.030669334839197
note how "GPU" became prefix, Usage metric name and middle part got into "kind" tag
another example, "(?<Entity_GPU>)(?<MetricName>PCIe Link Speed)"
will match sensor "PCIe Link Speed"
and will result in prometheus metric hwi_gpu_pcie_link_speed_gts{unit="GT/s",sensor_type="SENSOR_TYPE_OTHER",sensor="PCIe Link Speed",source="GPU [#0]: NVIDIA GeForce RTX 4090 Laptop",host="AMARU"} 2.5
note how original string did not have "GPU" anywhere and we set "Entity" value manually
mappings below should give you a general idea how to transform/unify/extract metrics
also all unmatched sensors are appended at the end with "hwi_zzzunmapped_" prefix
you can explore them, use as-is or transform for your needs
this syntax/approach is somewhat limited. if you want complex templating, eg. have multiple disk drives as tags,
there is no way to extract sensor source (folder from hwinfo with device name) and manipulate it.
instead, use prometheus server relabeling. for aggregation use recording rules or calculate stats in promql queries.
mapping:
- name: IndividualValues
patterns:
# RAM/VRAM
- '(?<Entity_RAM>)(?<memory>Virtual|Physical)\sMemory\s(?<MetricName>Committed|Used|Available|Load)'
- '(?<Entity_RAM>)Page File Usage(?<MetricName_Load>)(?<memory_Swap>)'
- '(?<Entity_VRAM>)(?<memory>GPU\s(\w+\s)?Memory)\s(?<MetricName>Usage|Available|Allocated|Dedicated|Dynamic)'
# RAM timings
- '(?<Entity_RAM>)Memory (?<MetricName>(Clock Ratio)|Clock)'
- '(?<Entity_RAM>)(?<MetricName>Tcas|Trcd|Trp|Tras|Trc|Trfc|Command Rate)'
# Cores
- '(?<Entity>Core) (?<CoreNo>\d+) (?<MetricName>VID|Clock|Ratio)'
- '(?<Entity>Core) (?<CoreNo>\d+) T(?<hyperThread>0|1) (?<MetricName>(Effective Clock|Usage|Utility))'
- '(?<Entity>Core) (?<CoreNo>\d+) (?<State>\w+) (?<MetricName>Residency)'
- '(?<Entity>Core) (?<CoreNo>\d+) (?<MetricName>Power)'
# Core temps
- '(?<Entity>Core)(?<CoreNo>\d+) \((?<ccd>CCD\d+)\)(?<MetricName_temp>)'
# CPU
- '(?<Entity_CPU>)(?<MetricName>(Bus Clock)|(Average Effective Clock))'
- '(?<Entity_CPU>)(?<MetricName_Usage>)(?<kind_max>)Max CPU/Thread Usage'
- '(?<Entity_CPU>)(?<MetricName_Usage>)(?<kind_total>)Total CPU Usage'
- '(?<Entity_CPU>)(?<MetricName_Utility>)(?<kind_total>)Total CPU Utility'
- '(?<Entity_CPU>)Package (?<State>\w+) (?<MetricName>Residency)'
- '(?<Entity_CPU>)(?<MetricName>DRAM \w+ Bandwidth)'
- '(?<Entity_CPU>)(?<MetricName>Average Active Core Count)'
# CPU temps
- '(?<Entity>CPU) (?<kind_tctl>)\(Tctl/Tdie\)(?<MetricName_Temp>)'
- '(?<Entity>CPU) (?<kind_average>)Die \(average\)(?<MetricName_Temp>)'
- '(?<Entity>CPU) (?<kind>CCD\d+) \(Tdie\)(?<MetricName_Temp>)'
- '(?<Entity>CPU) (?<kind_hotspot>)IOD Hotspot(?<MetricName_Temp>)'
# CPU temps and voltage
- '(?<Entity>CPU) VDDCR_(?<kind>\w+) (?<MetricName>\w+) \((?<interface>.*)\)'
- '(?<Entity>CPU) VDD_(?<kind>\w+) (?<MetricName>\w+) \((?<interface>.*)\)'
# CPU clocks
- '(?<Entity_CPU>)(?<MetricName>.*) Clock \(.CLK\)'
- '(?<Entity_CPU>)(?<MetricName>Frequency Limit) - Global'
# L3 temps and clocks
- '(?<Entity_CPU>)(?<MetricName_L3>)L3 Cache \((?<ccd>CCD\d+)\)'
# CPU current
- '(?<Entity>CPU|SoC|MISC) (?<MetricName>Core Current|Current) \((?<interface>.*)\)'
- '(?<Entity>CPU) (?<MetricName>TDC|EDC)$'
# CPU power
- '(?<Entity>CPU) (?<kind>.*) (?<MetricName>Power)\s?\(?(?<interface>[^)]*)\)?'
- '(?<Entity_CPU>)(?<kind>Core\+.*) (?<MetricName>Power)\s?\(?(?<interface>[^)]*)\)?'
# CPU limits
- '(?<Entity_CPU>)(CPU|APU) (?<kind>.*) Limit(?<MetricName_Limit>)'
- '(?<Entity_CPU>)(?<kind>Thermal) Limit(?<MetricName_Limit>)'
- '(?<Entity_CPU>)(?<MetricName_Throttling>)Thermal Throttling \((?<kind>.*)\)'
# GPU
- '(?<Entity>GPU) (?<kind>.+) (?<MetricName>Load)'
- '(?<Entity>GPU) (?<kind>.+) (?<MetricName>Usage)'
- '(?<Entity_GPU>)(?<MetricName>PCIe Link Speed)'
- '(?<Entity_GPU>)(?<kind>.*?) ((Error Count)|Errors|Count)(?<MetricName_Errors>)'
# GPU temps
- '(?<Entity>GPU) (?<kind_average>)Temperature(?<MetricName_Temp>)'
- '(?<Entity>GPU) (?<kind>.*) Temperature(?<MetricName_Temp>)'
# GPU voltage
- '(?<Entity>GPU) (?<kind>.+) (?<MetricName>Voltage)'
# GPU clocks
- '(?<Entity>GPU) (?<kind_core>)(?<MetricName>Clock)'
- '(?<Entity>GPU) (?<kind>\w+) (?<MetricName>Clock)'
# GPU power
- '(?<Entity>GPU) (?<kind_total>)(?<MetricName>Power)'
- '(?<Entity>GPU) (?<kind>.*) (?<MetricName>Power).*'
# GPU limits
- '(?<Entity>GPU) (?<kind>Thermal) Limit(?<MetricName_Limit>)'
- '(?<Entity_GPU>)Performance Limit - (?<kind>.*)(?<MetricName_Limit>)'
# Lenovo Legion Toolkit
- '(?<Entity_BAT>)(?<MetricName_Temp>)Battery Temperature'
- '(?<Entity>\w+) (?<MetricName>Fan)'
# append all other metrics without mappings to end
- '(?<Entity_ZZZUNMAPPED>)'
# this feature throws exceptions for me. also, better use prometheus/telegraf/whatever for extra functionality
wmiService:
sources: []
Thanks a lot for the effort!
I haven't had much time to dedicate on improving the system. If you don't mind I'll add your docs to the actual docs as well and give you credit for it.
The wmiService part should work, but my own use cases for that has been quite limited. I've personally monitored the process memory usage with that.
You can add another issue if you have some specific use case and I can try to repro it.
sure, use this however you like :)
with default config (that includes wmi section) i have an exception on /metrics page out of the box, something is off with WMI. that's why i disabled it:
System.ArgumentException: Error in WMI Provider Query: SELECT Name, WorkingSetPrivate FROM Win32_PerfFormattedData_PerfProc_Process
---> System.Management.ManagementException: Invalid query
at System.Management.ManagementException.ThrowWithExtendedInfo(ManagementStatus errorCode)
at System.Management.ManagementObjectCollection.ManagementObjectEnumerator.MoveNext()
at SensorMonHTTP.WMIProvider.GetDataItemsAsync(Object[] parameterTuples)
--- End of inner exception stack trace ---
at SensorMonHTTP.WMIProvider.GetDataItemsAsync(Object[] parameterTuples)
at PrometheusProcessor.ServiceProcessor.<>c__DisplayClass23_0.<<InitializeProcessors>b__2>d.MoveNext()
--- End of stack trace from previous location ---
at PromDapterSvc.Controllers.MetricsController.getPrometheusContent(String filter, Dictionary`2 paramDictionary)
at PromDapterSvc.Controllers.MetricsController.Get(String filter, String option)
maybe it's because my OS in win 10 home. don't bother reproducing it, i'm not going to use it personally
That OS difference might explain it. I have same block in my WMI services query and its working fine.
I try to dig a bit more info on Win 10 Home vs Pro differences. Or if WMI needs to be enabled or something.