Check manifest to check index report
This is a proposed solution to #1124. This is just a draft as I would like feedback as to whether this is the right approach and will add tests if this is along the right lines. The PR does two things:
-
Removes the retry logic from the Indexer's controller. The retry logic never actually triggered a retry, any request that may timeout also returned the Terminal next state which is processed after the
if retrysection to break out of the for loop and exit. Having the retry in there meant that empty index reports could be persisted if a query timed out incheckManifest(see the issue description). There was then two options- Fix the retry logic
- Delete the retry logic
I opted for deleting the retry logic as it simplified the code. If you would prefer the retry logic was fixed then I think it would just need a
continueadding to the loop and fixing the persisting of an empty manifest when this state occurs (probably with a retry check when callingSetIndexReport). I don't know if consideration would also want to be given to how many times a retry happened to not get stuck in this state. All these thoughts made me think simpler code was better code so deleted it! -
If all
Scanners complete for a manifest then the index report should be persisted in theIndexFinishedstate with all the required information in it for querying. If it does not end up in this state it will not be possible to query the manifest but the current implementation ofcheckManifestwould also not allow for a rescan. I have updatedcheckManifestto make sure that the manifest is inIndexFinishedstate and if not it will reprocess the whole image from scratch. My main question with this part of the change is whetherFetchLayersis the right state to go into. By the time we have check the index report we know all theScanners have database information for them so would we be better skipping toCoalesceso that we do not need to pull the image again? I opted forFetchLayersfor two reasons, both of which I suspect someone more familiar with the code may disagree with:- We don't necessarily know when the index report did not end up in the right state, I felt FetchLayers was the safest option for handling an unexpected state but this only holds true if there was any situation where
ManifestScannedreturns true but not everything is in the database. - I was worried there may be information required in
Coalescethat is only set in previous states rather than being in the database. - I also worry that this could cause excessive amounts of work for when there is a genuine reason a manifest could not be scanned, for instance if it has invalid contents that cannot be scanned by Clair. This PR does not do anything to differentiate between genuine errors and accidental errors so could trigger lots of work.
- We don't necessarily know when the index report did not end up in the right state, I felt FetchLayers was the safest option for handling an unexpected state but this only holds true if there was any situation where
Would love the thoughts of someone familiar with this code!
Would really love to see this improvement. Today after an upgrade problem we have a load of manifests with a committed index report with State: IndexError, and success: false. But resubmitting the manifest for another scan attempt just goes down the Manifest already scanned path, so they are stuck with bad index reports.
@hdonnay @crozzy any chance of getting this prioritised at some point? Thanks :)