claircore icon indicating copy to clipboard operation
claircore copied to clipboard

Check manifest to check index report

Open iainduncani opened this issue 2 years ago • 1 comments

This is a proposed solution to #1124. This is just a draft as I would like feedback as to whether this is the right approach and will add tests if this is along the right lines. The PR does two things:

  1. Removes the retry logic from the Indexer's controller. The retry logic never actually triggered a retry, any request that may timeout also returned the Terminal next state which is processed after the if retry section to break out of the for loop and exit. Having the retry in there meant that empty index reports could be persisted if a query timed out in checkManifest (see the issue description). There was then two options

    1. Fix the retry logic
    2. Delete the retry logic

    I opted for deleting the retry logic as it simplified the code. If you would prefer the retry logic was fixed then I think it would just need a continue adding to the loop and fixing the persisting of an empty manifest when this state occurs (probably with a retry check when calling SetIndexReport). I don't know if consideration would also want to be given to how many times a retry happened to not get stuck in this state. All these thoughts made me think simpler code was better code so deleted it!

  2. If all Scanners complete for a manifest then the index report should be persisted in the IndexFinished state with all the required information in it for querying. If it does not end up in this state it will not be possible to query the manifest but the current implementation of checkManifest would also not allow for a rescan. I have updated checkManifest to make sure that the manifest is in IndexFinished state and if not it will reprocess the whole image from scratch. My main question with this part of the change is whether FetchLayers is the right state to go into. By the time we have check the index report we know all the Scanners have database information for them so would we be better skipping to Coalesce so that we do not need to pull the image again? I opted for FetchLayers for two reasons, both of which I suspect someone more familiar with the code may disagree with:

    1. We don't necessarily know when the index report did not end up in the right state, I felt FetchLayers was the safest option for handling an unexpected state but this only holds true if there was any situation where ManifestScanned returns true but not everything is in the database.
    2. I was worried there may be information required in Coalesce that is only set in previous states rather than being in the database.
    3. I also worry that this could cause excessive amounts of work for when there is a genuine reason a manifest could not be scanned, for instance if it has invalid contents that cannot be scanned by Clair. This PR does not do anything to differentiate between genuine errors and accidental errors so could trigger lots of work.

Would love the thoughts of someone familiar with this code!

iainduncani avatar Oct 31 '23 09:10 iainduncani

Would really love to see this improvement. Today after an upgrade problem we have a load of manifests with a committed index report with State: IndexError, and success: false. But resubmitting the manifest for another scan attempt just goes down the Manifest already scanned path, so they are stuck with bad index reports.

@hdonnay @crozzy any chance of getting this prioritised at some point? Thanks :)

paulaldridge avatar Mar 25 '24 14:03 paulaldridge