client_python icon indicating copy to clipboard operation
client_python copied to clipboard

CPU performance is degraded on version 0.22.1

Open osd42 opened this issue 7 months ago • 7 comments

Hello I spent days searching for CPU issue in my Dynatrace Python extension (which is basicaly a prometheus scraper for a Solace prometheus exporter).

I finally noticed a big performance gap between versions 0.21.1 and 0.22.1.

0.22.1 consumes 4 times more CPU than 0.21.1. I finally downgraded to 0.21.1 and my CPU is back to normal state.

Here is my usage:

            # Prometheus parser: https://prometheus.github.io/client_python/parser/
            for prometheus_line in utf8_lines:
                for family in text_string_to_metric_families(prometheus_line):
                    for sample in family.samples:
                        skip = False # By default, no metric is skipped unless we filter it

                        if sample.name in wanted_solace_metrics:
                            found_metrics += 1
                            # self.logger.info("line: " + prometheus_line)
                            # print("Name: {0} Labels: {1} Value: {2}".format(*sample))

                            # NaN detection with math library
                            if math.isnan(sample.value):
                                invalid_metrics += 1
                                self.logger.info("NaN value skipped " + sample.name)
                                # If the value is NaN, we ignore it
                                break
                            else:
                                valid_metrics += 1
                                dims= {**sample[1], "node": nodename, "clustername": clustername}

                                # Remove unwanted dimensions
                                    #define the keys to remove
                                keys = ['client_name', 'client_address', 'client_profile', 'flow_id', 'acl_profile']
                                for key in keys:
                                    result_pop=dims.pop(key, None)
                                    if sample.name not in censored_metrics_list and result_pop is not None:
                                        # print("DETECTED")
                                        censored_metrics_list.append(sample.name)

                                # parse exporter errors
                                if "error" in dims.keys():
                                    solace_prometheus_exporter_error=str(dims["error"]).replace('\"',"").strip()
                                    sanitized_solace_prometheus_exporter_error={ "error": solace_prometheus_exporter_error}
                                    dims.update(sanitized_solace_prometheus_exporter_error)

                                # Remove unwanted queue protocols or modify queue names patterns
                                if "queue_name" in dims.keys():
                                    # avoid the ingestion issue with bad queue names with trailing \n
                                    queue_name=str(dims["queue_name"]).strip()
                                    sanitized_queue_name = { "queue_name": queue_name }

                                    # update the queue_name in the dims payload
                                    dims.update(sanitized_queue_name)

                                    queue_name_lower=queue_name.lower()
                                    if queue_name_lower.startswith('#mqtt') or queue_name_lower.startswith('#cfgsync') or queue_name_lower.startswith('#p2p') or queue_name_lower.startswith('#pq') or queue_name_lower == "":
                                    # if queue_name_lower.startswith('#cfgsync') or queue_name_lower.startswith('#p2p') or queue_name_lower.startswith('#pq') or queue_name_lower == "":

                                        skip = True

                                # Manage non skipped metrics
                                if skip is False:
                                    # Keeps queue quota calcuted metrics
                                    if sample.name == "solace_queue_spool_usage_bytes":
                                        queue_usage.append({ "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})
                                    if sample.name == "solace_queue_spool_quota_bytes": # we store this metrics in a separate table to calculate disk usage later
                                        queue_quota.append({ "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})

                                    # send valid points (with dimensions strings as md5 if necessary)
                                    if sample.name not in censored_metrics_list:
                                        # Append valid points
                                        valid_points.append( { "metric_name": sample.name, **dims, "METRICvalueMETRIC": sample.value})
                                    else:
                                        # if the metric is aggregated we add a md5sum of "all the dimensions" as an index to find duplicates and ease "groupby" without pandas
                                        valid_points.append( { "metric_name": sample.name, "md5_dims": hashlib.md5(str(dims).encode("utf-8")).hexdigest(), **dims, "METRICvalueMETRIC": sample.value})

If this code is still correct in 0.22.1, I think there is an issue in newer versions 0.22.x.

Best regards, Charles

osd42 avatar Jun 17 '25 08:06 osd42

Hello, thank you for the report, there were a couple of significant changes to parsing that were released in 0.22. Just to make sure, you are importing prometheus_client.parser not openmetrics correct? If so my guess is https://github.com/prometheus/client_python/pull/1070 is having performance issues and we can look at creating some benchmarks to improve the performance.

csmarchbanks avatar Jun 24 '25 19:06 csmarchbanks

Hello, Here is my import

from prometheus_client.parser import text_string_to_metric_families

The example was found on this documentation.

Best regards, Charles

osd42 avatar Jun 25 '25 10:06 osd42

Thanks! I'll take a look when I can.

csmarchbanks avatar Jun 25 '25 19:06 csmarchbanks

I started looking into this by creating a benchmark (see https://github.com/prometheus/client_python/pull/1116), however I am seeing basically identical mean performance when switching between 0.21.1 and 0.22.1. Is there something about the data that I could change to be more representative of your use case?

v0.21.1

------------------------------------------------------------------- benchmark: 1 tests ------------------------------------------------------------------
Name (time in ns)                                      Min         Max      Mean   StdDev    Median     IQR    Outliers  OPS (Mops/s)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_text_string_to_metric_families     125.0000  2,132.6842  132.6872  11.6190  131.5789  1.0789  4043;20506        7.5365  198334          38
---------------------------------------------------------------------------------------------------------------------------------------------------------

v0.22.1

----------------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------------
Name (time in ns)                                      Min       Max      Mean  StdDev    Median     IQR   Outliers  OPS (Mops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_text_string_to_metric_families     127.0800  866.2500  133.1531  7.3024  132.0900  1.2400  1913;4980        7.5102   75234         100
-----------------------------------------------------------------------------------------------------------------------------------------------------

csmarchbanks avatar Jun 26 '25 20:06 csmarchbanks

Hello @csmarchbanks, I am working in a large company. The Prometheus exporters I am scrapping with the client_python lib are about a Solace Pubsub: the dataset is very very big. It is so big that we need to parallelize a lot of scrapping endpoints for the same Solace Broker Prometheus API. Perhaps you can try to benchmark (1000 times, 10000 times) more prometheus lines to reproduce a significant downgraded performance.

I tried to calculate the number of prometheus line we do ingest on 1 broker endpoints between 2 garbage collectors for 1 solace broker: 37 004 / minute. As you see in my code, we sort out which line should be ingested or not. So I estimate we read 100000 lines / minute for the biggest broker. Then, on the same (dynatrace ActiveGate) machine, we configure multiple exporters which scrape multiple endpoints of multiple Solace Brokers. On the production environment where CPU usage is 4 times higher, we are scrapping 9 brokers (between 18 and 27 prometheus endpoints), so to get the performance issue, perhaps 900000 lines per minute should be a representative benchmark.

I hope this helps.

Best regards, Charles

osd42 avatar Jul 01 '25 08:07 osd42

Turns out I just needed to make sure to iterate across everything yielded by the generator and the regression shows up. My first attempt was only calculating the time to return the first metric family... The small benchmark I created now shows results taking 5-6x as long which is close enough to your report.

I'll look into optimizing the new codepath next!

csmarchbanks avatar Jul 02 '25 03:07 csmarchbanks

I created a first batch of optimizations that according to the benchmark should help, but won't be back to original performance yet: https://github.com/prometheus/client_python/pull/1117. If you have a chance I'd love to double check that those help your environment as well.

csmarchbanks avatar Jul 03 '25 15:07 csmarchbanks