scrape + discovery: Required changes for serverless use
Rebases on top of @ridwanmsharif changes https://github.com/GoogleCloudPlatform/prometheus/pull/122 with following updates:
- Simplified discovery, I think we can always skip wait and make it more robust.
- Adjusted TestManagerStopAfterScrapeAttempt to simulate bit more how external users might use it.
- Removed one test, we test it TestManagerStopAfterScrapeAttempt already
- Cleaned up comments.
- Fixed lock bug that was here already.
More details on locking:
Checked -race on TestManagerStopAfterScrapeAttempt and indeed without your locks.
It produced RACE, I would argue it's not possible as everything is guarded by mtxScrape:
// reload -> ApplyConfig mtxScrape // stop -> Stop/StopAfterScrapeAttempt/ApplyConfig mtxScrape // sync -> Sync -> manager reload mtxScrape
Turned out - there was bug in our locking (mtxScrape Unlock was too soon). Fixed, and now no more locks are needed.
BTW @ridwanmsharif epic work on DiscoveryReloadOnStartup and SkipInitialWait in discovery. Super nice solutions! I changed it only slightly, but feel free to reject 🤗
Fuzzing issue seems unrelated, I would ignore.
Looks like races on my discovery changes. They might also occur on your changes, just we don't test those with this special flag 🤔
Merged https://github.com/GoogleCloudPlatform/prometheus/pull/122 for now as this needs some love.
The race is on test (using discoveryManager.targets without lock while .Run is till there), but I need to verify we won't have some other consequences for non sidecar cases... t