cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Downsampling

Open mhausenblas opened this issue 4 years ago • 17 comments

The Cortex roadmap contains, as of time of writing, an item called Downsampling with a high-level description as follows:

Downsampling means storing fewer samples, e.g. one per minute instead of one every 15 seconds. This makes queries over long periods more efficient. It can reduce storage space slightly if the full-detail data is discarded.

Given the increased interest in the community around this feature it seems now is a good time to have a more focused conversation around scope and potential solution approaches.

There are a number of things we may want to address in the proposal, however, we would at least want to cover the following questions:

  • Will we keep the original high resolution data or delete it (after some grace period) once it is sampled?
  • What is the granularity of the feature (timeseries of a tenant, specific timeseries, etc.)?
  • Should we align with/re-use downsampling as it is offered by Thanos?
  • How does the UX look like and what potential implications does the feature have in terms of footprint and performance?

With this issue, I'd like to officially kick off the work addressing the Downsampling feature in Cortex. That is, I intend to submit a proposal that introduces said feature in a future version of Cortex.

mhausenblas avatar Jun 26 '21 10:06 mhausenblas

From a timeline perspective, my plan was to have a WIP proposal together that we can discuss at the next Community Call on 2021-07-15.

mhausenblas avatar Jun 26 '21 10:06 mhausenblas

Nice! From Thanos side, our downsampling proven to be precise, and gave huge boost for queries for longer time retention (year). However there few things we wanted to improve and consider in future. Things you might want to try on the path to introducing Cortex downsampling if you want use any of Thanos experience:

  • We need better downsampling planning. Right now it was downsampling only after full compaction and only after 2w of time.
  • Make sure to document better that it does not help with cardinality
  • Choice of what resolution data promQL has to use can be still optimized. There is some work to pass correct hints through promql e.g subqueries.
  • Maybe far fetching but having downsampling part of separate blocks was never leverage. From my perspective now I would probably put downsampled chunks in the same tsdb block as raw data to maintain same index. This will cut baseline mem, number of blocks and size on disk and object strorage considerably. There is caveat though which makes this idea irrelevant for Cortex (see note2)

Note: There are many temptations to make downsampling resolution configurable. We managed to explain users why it's not needed and at the end I believe it worked fine for us. Avoiding configurability here makes promQL experience with downsampled data nicely deterministic. Note2: cortex works on smaller blocks (1d?). In Thanos we have 2w blocks in most long duration cases. I think for downsampled data it makes sense to ensure series have actually enough samples to get downsampled so potentially increasing block sizes for those.

Let me know if you need any help, we can help from Thanos side!

bwplotka avatar Jun 26 '21 11:06 bwplotka

We need better downsampling planning. Right now it was downsampling only after full compaction and only after 2w of time.

This may be less a problem in Cortex, where the largest compacted block is 1d.

Make sure to document better that it does not help with cardinality

Doesn't reduce cardinality, but reduce the total number of samples to process (PromQL engine performances typically linearly increase with the number of samples to process). Am I correct?

Note2: cortex works on smaller blocks (1d?). In Thanos we have 2w blocks in most long duration cases. I think for downsampled data it makes sense to ensure series have actually enough samples to get downsampled so potentially increasing block sizes for those.

Good point.

Assuming a scrape interval of 20s, we have 86400/20=4320 samples per series in the case of no churning. If we downsample to 1 sample every 5 minutes, we get 288 samples per day, which looks still good.

However, if a series churns every hour, we have 3600/20=180 samples for that churning series. If we downsample to 1 sample every 5 minutes, we get 12 samples, which looks quite a waste considering we could fit 120 samples in a chunk.

Is my analysis matching what you had in mind? Or you mean something different?

pracucci avatar Jun 30 '21 15:06 pracucci

which looks quite a waste considering we could fit 120 samples in a chunk.

Chunk with 12 samples will still be much smaller than chunk with 120 samples (under ideal conditions even 10x smaller), but it takes the same space in the index to refer to it. Is that what you refer to as "waste"? Smaller chunks are still a win in my opinion.

(We could "fit" as many samples into a chunk as we want, it's only that Prometheus settled on 120.)

pstibrany avatar Jun 30 '21 18:06 pstibrany

Chunk with 12 samples will still be much smaller than chunk with 120 samples (under ideal conditions even 10x smaller), but it takes the same space in the index to refer to it. Is that what you refer to as "waste"? Smaller chunks are still a win in my opinion.

Sure, smaller chunks are still a win. Moreover, PromQL engine performances are in direct relation with the number of samples to process (we've seen it many times) so the less samples the faster the engine is (I know there are other factors, this is intentionally an over simplification).

Back to our discussion, by "waste" (I used an incorrect term) I mean what you were mentioning. The ratio between index size and chunks size (in bytes) may change drastically with downsampling and index could be bigger than chunks. In a scenario with a low churning rate, may make sense to have downsampled blocks spanning over a period larger than 1d.

pracucci avatar Jul 01 '21 08:07 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 01 '21 22:10 stale[bot]

As reported at last community meeting, now that we have GAed our services, I can focus on this again.

mhausenblas avatar Oct 02 '21 05:10 mhausenblas

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 02 '22 09:01 stale[bot]

I'm on it.

mhausenblas avatar Jan 02 '22 20:01 mhausenblas

I'm very interested in reviewing this proposal. There is a case to be made for allowing a reduction in tags/labels when aggregating and downsampling to optimize for longer-term storage and cardinality improvements. That may be a level-up use of downsampling, but it was one of the best aspects of the continuous queries concept in InfluxDB.

benhastings avatar Mar 01 '22 20:03 benhastings

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 11:06 stale[bot]

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 20 '23 03:03 stale[bot]

still on the roadmap

jeromeinsf avatar Mar 20 '23 12:03 jeromeinsf

Adding a +1

martin-walsh avatar Jul 11 '23 15:07 martin-walsh

I am trying to work on a poc https://github.com/yeya24/cortex/tree/downsampling-poc by just porting over what Thanos has so far to compactor.

Still need to do and think about more:

  1. We probably need to research more about the potential of > 1d blocks in order to have enough samples to leverage downsampling feature better
  2. Can we reuse the same block for downsampled chunks? It is probably not easy as we need to rewrite the block index to add the new chunk refs. And we need a way in index file to identify downsampled chunks.
  3. We need to change how queriers decide which blocks to query beforehand, taking downsampling resolution into consideration.
  4. Thanos has max_resolution. It is not bad but probably not very user fridendly when using Grafana or API, as it is not an official Prometheus API param. We need to think about is there a way to infer that based on user queries and request parameters.

yeya24 avatar Aug 13 '23 20:08 yeya24