prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

TSDB: Option to configure TSDB Block Reload Interval

Open Naman-B-Parlecha opened this issue 8 months ago • 14 comments

Work around for #16649

This PR adds a new TSDB flag --storage.tsdb.block-reload-interval that allows users to customize how frequently TSDB blocks are reloaded, providing more control over block discovery and reload behavior. This is to Reduce wait time from up to 1 minute to configurable interval

Maintains default 1 minute behavior when not specified Allowing intervals from seconds to hours based on use case requirements and to reduce I/O overhead

The minimum value for this flag is 1s and default is 1m

Usage:

--storage.tsdb.block-reload-interval=10s 

test:

time=2025-10-27T22:14:53.210+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:14:55.212+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:14:57.216+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:14:59.218+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:01.220+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:03.223+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:05.226+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:07.229+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:09.232+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:11.235+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:13.239+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s
time=2025-10-27T22:15:15.243+05:30 level=INFO source=db.go:1107 msg="reload interval" duration=2s

Does this PR introduce a user-facing change?

[FEATURE]: --storage.tsdb.block-reload-interval flag to configure TSDB Block Reload Interval

Naman-B-Parlecha avatar Jun 14 '25 08:06 Naman-B-Parlecha

Hey @jesusvazquez could you take a look at this, and suggest any changes if required?

Naman-B-Parlecha avatar Jun 18 '25 18:06 Naman-B-Parlecha

Is it possible to add a feature where block reloads by invoking some functions. thinking in terms of event driven architecture.

HawkingRadiation42 avatar Jun 21 '25 02:06 HawkingRadiation42

@HawkingRadiation42 Yep i was working on it will push it this weekend

Naman-B-Parlecha avatar Jun 21 '25 09:06 Naman-B-Parlecha

Is it possible to add a feature where block reloads by invoking some functions. thinking in terms of event driven architecture.

This is a work around for optimizing the reload rather than 1min

Naman-B-Parlecha avatar Jun 21 '25 09:06 Naman-B-Parlecha

Hey @bboreham, I will plan and refactor this PR in next few days

Naman-B-Parlecha avatar Oct 08 '25 18:10 Naman-B-Parlecha

I do wonder if this should be in the configuration file or just a flag. It seems to be that it would be better of to avoid changing this at run-time.

Additionally, I think we should set a minimum duration for reloads (10s, 1s?) to safeguard from very small values, as currently it can go as low as 1ms.

roidelapluie avatar Oct 31 '25 10:10 roidelapluie

agreed with it, we should add a minimum duration of 1s and not something like 10s because in the idea came up due to wait time of 1min as seen in https://github.com/prometheus/prometheus/issues/16649 hence adding 1s would be better

WDYT @roidelapluie @aknuds1

Naman-B-Parlecha avatar Oct 31 '25 12:10 Naman-B-Parlecha

Sounds good to me @Naman-B-Parlecha.

aknuds1 avatar Oct 31 '25 12:10 aknuds1

We could also have an issue to have an endpoint to trigger a reload via the HTTP API - maybe when lifecycle or admin http API's are enabled - so we can keep 1m but just reload ad-hoc. But if you wish so, let's discuss in a fresh issue.

roidelapluie avatar Oct 31 '25 14:10 roidelapluie

Hey @aknuds1 @roidelapluie, i have added in the test cases, PTAL!!!

Naman-B-Parlecha avatar Nov 07 '25 07:11 Naman-B-Parlecha

The PR description says there is a storage.tsdb.block_reload_interval configuration parameter, but it's not implemented.

aknuds1 avatar Nov 07 '25 14:11 aknuds1

Hey @aknuds1 sry for delays had my exams going on, I have refactored to use require.Eventually() and drop sleep

Also changed the PR description to latest. PTAL!

Naman-B-Parlecha avatar Nov 14 '25 08:11 Naman-B-Parlecha

Taking another look.

aknuds1 avatar Nov 28 '25 07:11 aknuds1

Hey @aknuds1 i have addressed all the suggestions! one change i didnt make is comment but instead increased the time by adding a buffer of 1 second if that works??

Naman-B-Parlecha avatar Dec 02 '25 12:12 Naman-B-Parlecha

@aknuds1 done, Failing FUZZ test are not related to the PR.

Thanks for helping out!!

Naman-B-Parlecha avatar Dec 14 '25 11:12 Naman-B-Parlecha

I've asked @roidelapluie about fuzzing being broken. I see he has a PR open to move to native fuzzing, I'm wondering whether that's all it takes to unbreak fuzzing.

aknuds1 avatar Dec 14 '25 13:12 aknuds1

I think it's safe to merge this even though fuzzing doesn't go through, because it's broken for other reasons than changes in this PR.

aknuds1 avatar Dec 15 '25 08:12 aknuds1

Hi, I tweaked the release note: Before

[FEATURE]: --storage.tsdb.block-reload-interval flag to configure TSDB Block Reload Interval

After:

[ENHANCEMENT]: TSDB: add flag --storage.tsdb.block-reload-interval to configure TSDB Block Reload Interval

(I don't think a flag to tweak an existing feature counts as a FEATURE)

bboreham avatar Dec 17 '25 16:12 bboreham

Thanks @bboreham - good point.

aknuds1 avatar Dec 17 '25 17:12 aknuds1