OpenRegistry icon indicating copy to clipboard operation
OpenRegistry copied to clipboard

S3-based layer uploads are duplicated

Open jay-dee7 opened this issue 2 years ago • 0 comments

Describe the bug When we push a container image, it's logically split into multiple paths inside an S3 bucket. Eg, for user johndoe pushing johndoe/test image, this is how we store it in S3:

  • Image Layer Hash -> sha256:someuniquehashvalue
  • Layer UUID -> 093f2a0e-95b6-40ad-aae5-56df1beb546c - UUID V4 (always unique, no matter what)
  1. layer/093f2a0e-95b6-40ad-aae5-56df1beb546c
  2. johndoe/test/manifests/<tags>

Now, this layers/ path is shared across users & it works like a pool of layers. This was designed like this so that we can check if we already have a layer and reuse it. However, we're comparing the UUIDs of upload session, which are always unique. Since the UUIDs never match, we end up uploading the container image layer every time. This sort of nullifies the optimization & wastes storage.

To Reproduce Upload a container image with a tag, then re-tag the image with a different tag value without making any modifications to the container image itself. You'll now have 2X the layers but there should only be 1 set of layers in the layers/ directory (path).

Expected behavior If a layer already exists, the API should return either a success response or mount the layer.

Screenshots

The CIDs below clearly show that the data in these layers is the same, but we still have 4 copies.

image

Log Files nil

Desktop (please complete the following information):

  • OS: MacOS
  • Version 13.3.1 (a)

Additional context Add any other context about the problem here.

jay-dee7 avatar May 06 '23 17:05 jay-dee7