prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

scrape + discovery: Required changes for serverless use

Open bwplotka opened this issue 2 years ago • 3 comments

Rebases on top of @ridwanmsharif changes https://github.com/GoogleCloudPlatform/prometheus/pull/122 with following updates:

  • Simplified discovery, I think we can always skip wait and make it more robust.
  • Adjusted TestManagerStopAfterScrapeAttempt to simulate bit more how external users might use it.
  • Removed one test, we test it TestManagerStopAfterScrapeAttempt already
  • Cleaned up comments.
  • Fixed lock bug that was here already.

More details on locking:

Checked -race on TestManagerStopAfterScrapeAttempt and indeed without your locks.

It produced RACE, I would argue it's not possible as everything is guarded by mtxScrape:

// reload -> ApplyConfig mtxScrape // stop -> Stop/StopAfterScrapeAttempt/ApplyConfig mtxScrape // sync -> Sync -> manager reload mtxScrape

Turned out - there was bug in our locking (mtxScrape Unlock was too soon). Fixed, and now no more locks are needed.

BTW @ridwanmsharif epic work on DiscoveryReloadOnStartup and SkipInitialWait in discovery. Super nice solutions! I changed it only slightly, but feel free to reject 🤗

bwplotka avatar Nov 22 '23 14:11 bwplotka

Fuzzing issue seems unrelated, I would ignore.

bwplotka avatar Nov 22 '23 14:11 bwplotka

Looks like races on my discovery changes. They might also occur on your changes, just we don't test those with this special flag 🤔

bwplotka avatar Nov 22 '23 14:11 bwplotka

Merged https://github.com/GoogleCloudPlatform/prometheus/pull/122 for now as this needs some love.

The race is on test (using discoveryManager.targets without lock while .Run is till there), but I need to verify we won't have some other consequences for non sidecar cases... t

bwplotka avatar Nov 22 '23 18:11 bwplotka