jib icon indicating copy to clipboard operation
jib copied to clipboard

Cleaning Jib's local base image cache

Open chanseokoh opened this issue 6 years ago • 8 comments

We don't delete cached base image layers. (Note application layers are under a build directory.)

Some ideas:

  • Automatic (configurable?) or manual (new task/goal)? Or Both?
  • LRU? Add access time metadata? Keep a log?
  • Prints info on local cache usage during builds?

See #1982 as well.

chanseokoh avatar Sep 05 '19 19:09 chanseokoh

Out of curiosity, where is this cache?

It'd be nice to have a bit more transparency about what is cached where. I know about the cache under build/, but I'm trying to track down slow builds whenever we update our base image (slower than I think are justified for a pull, that is) and I have no idea where base images might be cached. Are they docker saved, cached in the daemon, etc.?

ndarilek avatar Sep 11 '19 19:09 ndarilek

On linux, the base image cache is at $HOME/.cache/google-cloud-tools-java/jib, unless you set -Djib.useOnlyProjectCache=true, in which case it uses the same cache in build/ for both base image and application layers.

FYI, Jib doesn't use Docker for anything other than for building to the Docker daemon (i.e. ./gradlew jibDockerBuild or ./mvnw jib:dockerBuild) or using a Docker daemon base image (which hasn't been released yet). Jib maintains its own cache separate from Docker.

TadCordle avatar Sep 11 '19 19:09 TadCordle

Thanks. Pardon my hijacking this for support questions--I promise not to continue beyond this point--but how do I set that property in build.gradle, or if I can't do it there, in skaffold.yaml? I tried:


jib {
   from {
     image = 'gcr.io/jit-access-dev/skrybe-deps:latest'
   }
   to {
     image = 'gcr.io/jit-access-dev/backend'
   }
   useOnlyProjectCache = true
}

and that failed because I'm setting an unknown property. Sorry, been reading through Jib docs trying to diagnose what is likely a case of this issue, and given that I have Jib running in a Gradle daemon that may or may not be inheriting the launching shell's environment, which is important because said environment is configured to use Docker in Minikube, I have no idea what may be at fault here. I didn't even know there was a separate base image cache, though I assumed there had to be. Anyhow, given that we're only using project-specific base images here, I want that cache automatically in my project directory so I have some transparency about what is getting created and when, and don't have to poke around in ~/.cache. So, yeah, brain fried by the layers within layers. :)

Thanks.

ndarilek avatar Sep 11 '19 19:09 ndarilek

jib.useOnlyProjectCache is a system property without a corresponding configuration parameter, so I think you either need to pass it via commandline, or put it in a gradle.properties file. It looks like you can add something like systemProp.jib.useOnlyProjectCache=true to gradle.properties, but I haven't tested, so I'm not sure.

As for skaffold.yaml, I think you can just pass it as an arg.

...
build:
  artifacts:
  - image: ...
    jibGradle:
      args:
      - -Djib.useOnlyProjectCache=true
...

TadCordle avatar Sep 11 '19 19:09 TadCordle

Thanks, that worked!

Not sure how to flag this, but the contents of this thread would be a great FAQ entry. I'm trying to diagnose issues where image builds take forever, and Debugging 101 for a process that downloads something is to watch the directory size to see how it grows. I didn't see anything in docker images, nor did my build directory grow substantially. And I had no clue that base images were in a separate cache. Makes sense in retrospect, but not when you're frantically trying to put out one fire so you can get to another. :)

So, some sort of "Where are things cached?" FAQ entry explaining the differences between app and base layers, along with where they're cached, would be great. It also never occurred to me that Jib would use |.cache/google-cloud-tools-java/jib--I was looking through .gradle directories when docker images -a came up blank.|

| |

|Anyhow, thanks again for the help. |

ndarilek avatar Sep 11 '19 19:09 ndarilek

FYI, you can set the environment variables for the docker command in recent versions.

jib {
  dockerClient.environment = [ DOCKER_HOST: '...', DOCKER_TLS_VERIFY: '...' ]
}

You can also pass the environment through -Djib.dockerClient.environment=key1="value1",key2="value2".

Having an FAQ entry sounds like a great idea. We will also think about exposing this information in other ways.

For diagnosing #1946, I'd start with a standalone build without Skaffold or Minikube. See #1970 and #1917 for ideas. And do https://github.com/GoogleContainerTools/jib/issues/1946#issuecomment-529054279 to understand what exactly is happening. We can follow up in #1946. I'd like to know and fix the problem as much as you do.

chanseokoh avatar Sep 11 '19 20:09 chanseokoh

Thanks. Another for the FAQ list--are environment variables inherited from the running environment? Or do they need to be explicitly specified in the Gradle configuration? Not asking you to answer because I think it's irrelevant to me at this point--just noting it as something I stumbled over being familiar with Docker but not with Jib.

I.e. is it enough to ship a script as part of our repo to set up the Docker environment in the executing shell to use Minikube, or do the variables need to be specified in Gradle as well?

I think my current issue is a very slow download process, significantly slower in Gradle/Java than in stock Docker. I'll follow up on the relevant issue once I've confirmed. But you're answering/hinting at lots of questions that occurred to me when I tried debugging this and figuring out where my images were going, so I'm following up.

ndarilek avatar Sep 11 '19 20:09 ndarilek

When considering any cache cleaning options, please take into account people with slow, unreliable, and/or metered Internet connections.

Automatic removal of cached images may be an insignificant concern on a high speed connection with unlimited usage. But when it takes an hour to download 50MB (real scenario with T-Mobile) the picture is very different. Or when there is a 15GB per month quota before throughput is greatly throttled (Verizon). That is assuming the connection is reliable enough for the download to finish. In these situations once something has been successfully downloaded, having to download it again can be a real problem. And with metered connections incur monetary cost.

Obviously manual removal is an option. Automatic can work for the above cases provided it can be configured. For example, LRU combined with a minimum TTL (months to years) can work well. Perhaps with an option to exempt some images from removal. A nice feature would be to allow reviewing the cleanup plan before anything is erased (i.e. dry run). If a few minute review can save hours of waiting to re-download an inadvertent image removal, it is worthwhile.

sstock avatar Dec 03 '21 16:12 sstock

close as not planned

JoeWang1127 avatar Aug 12 '22 13:08 JoeWang1127