CPU performance is degraded on version 0.22.1
Hello I spent days searching for CPU issue in my Dynatrace Python extension (which is basicaly a prometheus scraper for a Solace prometheus exporter).
I finally noticed a big performance gap between versions 0.21.1 and 0.22.1.
0.22.1 consumes 4 times more CPU than 0.21.1. I finally downgraded to 0.21.1 and my CPU is back to normal state.
Here is my usage:
# Prometheus parser: https://prometheus.github.io/client_python/parser/
for prometheus_line in utf8_lines:
for family in text_string_to_metric_families(prometheus_line):
for sample in family.samples:
skip = False # By default, no metric is skipped unless we filter it
if sample.name in wanted_solace_metrics:
found_metrics += 1
# self.logger.info("line: " + prometheus_line)
# print("Name: {0} Labels: {1} Value: {2}".format(*sample))
# NaN detection with math library
if math.isnan(sample.value):
invalid_metrics += 1
self.logger.info("NaN value skipped " + sample.name)
# If the value is NaN, we ignore it
break
else:
valid_metrics += 1
dims= {**sample[1], "node": nodename, "clustername": clustername}
# Remove unwanted dimensions
#define the keys to remove
keys = ['client_name', 'client_address', 'client_profile', 'flow_id', 'acl_profile']
for key in keys:
result_pop=dims.pop(key, None)
if sample.name not in censored_metrics_list and result_pop is not None:
# print("DETECTED")
censored_metrics_list.append(sample.name)
# parse exporter errors
if "error" in dims.keys():
solace_prometheus_exporter_error=str(dims["error"]).replace('\"',"").strip()
sanitized_solace_prometheus_exporter_error={ "error": solace_prometheus_exporter_error}
dims.update(sanitized_solace_prometheus_exporter_error)
# Remove unwanted queue protocols or modify queue names patterns
if "queue_name" in dims.keys():
# avoid the ingestion issue with bad queue names with trailing \n
queue_name=str(dims["queue_name"]).strip()
sanitized_queue_name = { "queue_name": queue_name }
# update the queue_name in the dims payload
dims.update(sanitized_queue_name)
queue_name_lower=queue_name.lower()
if queue_name_lower.startswith('#mqtt') or queue_name_lower.startswith('#cfgsync') or queue_name_lower.startswith('#p2p') or queue_name_lower.startswith('#pq') or queue_name_lower == "":
# if queue_name_lower.startswith('#cfgsync') or queue_name_lower.startswith('#p2p') or queue_name_lower.startswith('#pq') or queue_name_lower == "":
skip = True
# Manage non skipped metrics
if skip is False:
# Keeps queue quota calcuted metrics
if sample.name == "solace_queue_spool_usage_bytes":
queue_usage.append({ "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})
if sample.name == "solace_queue_spool_quota_bytes": # we store this metrics in a separate table to calculate disk usage later
queue_quota.append({ "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})
# send valid points (with dimensions strings as md5 if necessary)
if sample.name not in censored_metrics_list:
# Append valid points
valid_points.append( { "metric_name": sample.name, **dims, "METRICvalueMETRIC": sample.value})
else:
# if the metric is aggregated we add a md5sum of "all the dimensions" as an index to find duplicates and ease "groupby" without pandas
valid_points.append( { "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})
If this code is still correct in 0.22.1, I think there is an issue in newer versions 0.22.x.
Best regards, Charles
Hello, thank you for the report, there were a couple of significant changes to parsing that were released in 0.22. Just to make sure, you are importing prometheus_client.parser not openmetrics correct? If so my guess is https://github.com/prometheus/client_python/pull/1070 is having performance issues and we can look at creating some benchmarks to improve the performance.
Hello, Here is my import
from prometheus_client.parser import text_string_to_metric_families
The example was found on this documentation.
Best regards, Charles
Thanks! I'll take a look when I can.
I started looking into this by creating a benchmark (see https://github.com/prometheus/client_python/pull/1116), however I am seeing basically identical mean performance when switching between 0.21.1 and 0.22.1. Is there something about the data that I could change to be more representative of your use case?
v0.21.1
------------------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_text_string_to_metric_families 125.0000 2,132.6842 132.6872 11.6190 131.5789 1.0789 4043;20506 7.5365 198334 38
---------------------------------------------------------------------------------------------------------------------------------------------------------
v0.22.1
----------------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_text_string_to_metric_families 127.0800 866.2500 133.1531 7.3024 132.0900 1.2400 1913;4980 7.5102 75234 100
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hello @csmarchbanks, I am working in a large company. The Prometheus exporters I am scrapping with the client_python lib are about a Solace Pubsub: the dataset is very very big. It is so big that we need to parallelize a lot of scrapping endpoints for the same Solace Broker Prometheus API. Perhaps you can try to benchmark (1000 times, 10000 times) more prometheus lines to reproduce a significant downgraded performance.
I tried to calculate the number of prometheus line we do ingest on 1 broker endpoints between 2 garbage collectors for 1 solace broker: 37 004 / minute. As you see in my code, we sort out which line should be ingested or not. So I estimate we read 100000 lines / minute for the biggest broker. Then, on the same (dynatrace ActiveGate) machine, we configure multiple exporters which scrape multiple endpoints of multiple Solace Brokers. On the production environment where CPU usage is 4 times higher, we are scrapping 9 brokers (between 18 and 27 prometheus endpoints), so to get the performance issue, perhaps 900000 lines per minute should be a representative benchmark.
I hope this helps.
Best regards, Charles
Turns out I just needed to make sure to iterate across everything yielded by the generator and the regression shows up. My first attempt was only calculating the time to return the first metric family... The small benchmark I created now shows results taking 5-6x as long which is close enough to your report.
I'll look into optimizing the new codepath next!
I created a first batch of optimizations that according to the benchmark should help, but won't be back to original performance yet: https://github.com/prometheus/client_python/pull/1117. If you have a chance I'd love to double check that those help your environment as well.