client_python icon indicating copy to clipboard operation
client_python copied to clipboard

Prometheus reads all .db files from PROMETHEUS_MULTIPROC_DIR without regard

Open htbrandao opened this issue 3 years ago • 1 comments

prometheus-client==0.15.0

The method collect on prometheus_client.multiprocess.MultiProcessCollector reads all .db files, leading to memory issues, as it reads SQLite databases.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/........./........../.............py", line 20, in .........................
    data = generate_latest(registry)
  File "/usr/lib/python3.8/site-packages/prometheus_client/exposition.py", line 198, in generate_latest
    for metric in registry.collect():
  File "/usr/lib/python3.8/site-packages/prometheus_client/registry.py", line 97, in collect
    yield from collector.collect()
  File "/usr/lib/python3.8/site-packages/prometheus_client/multiprocess.py", line 153, in collect
    return self.merge(files, accumulate=True)
  File "/usr/lib/python3.8/site-packages/prometheus_client/multiprocess.py", line 45, in merge
    metrics = MultiProcessCollector._read_metrics(files)
  File "/usr/lib/python3.8/site-packages/prometheus_client/multiprocess.py", line 73, in _read_metrics
    for key, value, _ in file_values:
  File "/usr/lib/python3.8/site-packages/prometheus_client/mmap_dict.py", line 43, in _read_all_values
    value = _unpack_double(data, pos)[0]
struct.error: unpack_from requires a buffer of at least 1634562696 bytes for unpacking 8 bytes at offset 1634562688 (actual buffer size is 12288)

The issue is on collect() method:

def collect(self):
    files = glob.glob(os.path.join(self._path, '*.db'))
    return self.merge(files, accumulate=True)

I believe that there should be some kind of file validation to assure that the file is related to a prometheus metric, e.g:

import re

POSSIBLE_PROMETHEUS_FILENAMES_INCLUDE = {'counter', '...', 'histogram', 'gauge_livesum'}

def collect(self):
    files = glob.glob(os.path.join(self._path, '*.db'))
    #
    # assert files
    valid_files =  [f for f in files if re.split('_\d+.db', f)[0] in POSSIBLE_PROMETHEUS_FILENAMES_INCLUDE]
    #
    #
    return self.merge(valid_files, accumulate=True)

htbrandao avatar Jan 10 '23 20:01 htbrandao

Best practice would certainly be to have all of the prometheus files in their own folder, especially since the folder is supposed to be wiped between process starts. That said, some additional safety seems reasonable to me.

csmarchbanks avatar Feb 07 '23 19:02 csmarchbanks