kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

kaniko build using too many memory

Open jyipks opened this issue 6 years ago • 46 comments

I am building a rather large docker image, end size is ~8GB. It builds fine in DinD, however we would like to use kaniko. The kaniko pod running the dockerfile balloons in memory usage and gets killed by kubernetes. How can I make kaniko work for me, or am I stuck with DinD?

Please help, thank you

jyipks avatar Dec 11 '19 02:12 jyipks

/cc @priyawadhwa Can we provide users anything to measure the memory usage ?

@jyipks Can you tell us if you have set resource limits in kaniko pod spec Also please tell us cluster specification

tejal29 avatar Jan 10 '20 22:01 tejal29

@tejal29 @jyipks the only thing I can think of is upping the resource limits on the pod as well

priyawadhwa avatar Jan 14 '20 21:01 priyawadhwa

i had no resource limits on the kaniko pods. This was on a 3 node cluster, 4cores, 16GB each. From grafana i believe the pod attempted to use more than 15GB I was building a custom jupyter-notebook image, that normally comes out to be ~8GB upon build completion via docker build.

jyipks avatar Jan 14 '20 23:01 jyipks

does kaniko keep everything in memory as its building the image or it writes to a temp directory? if it goes into a temp directory can you please provide it?

thanks

jyipks avatar Jan 21 '20 22:01 jyipks

This sounds like #862. @jyipks do you remember if you were using the --reproducible flag?

mamoit avatar Jul 28 '20 13:07 mamoit

No i've never used that flag before.

jyipks avatar Jul 28 '20 13:07 jyipks

This also happens when trying to do an npm install - I also have never used that flag before.

rvaidya avatar Mar 11 '21 03:03 rvaidya

same problem

max107 avatar Mar 12 '21 20:03 max107

Same problem on gitlab runner: latest debian with latest docker. Building 12Mb docker image uses 15Gb - 35Gb of memory.

tarakanof avatar Mar 17 '21 14:03 tarakanof

We're facing the same issue in Gitlab CI custom runner. We're building a docker image for node, the build started hanging on webpack everytime and the machine ends up running out of memory and crashes. It used to work fine without any issue. Our docker image is a little less than 300MB and our machine has 8Gb ram

fknop avatar Mar 26 '21 23:03 fknop

Similar issue on Gitlab CI on GKE. We're building a python image based on official python base image, it consumes about 12Gb of RAM

meseta avatar Mar 30 '21 12:03 meseta

We're seeing similar issues with gradle builds as well

jamil-s avatar Apr 19 '21 21:04 jamil-s

Would also like to learn more about this. Kaniko doesn't have a feature equivalent to docker build --memory, does it?

nichoio avatar Apr 20 '21 09:04 nichoio

We're seeing similar issues too. For example, this job failed with OOM: https://gitlab.com/gitlab-com/gl-infra/tamland/-/jobs/1405946307

The job includes some stacktrace information, which may help in diagnosing the problem.

The parameters that we were using, including --snapshotMode=redo are here: https://gitlab.com/gitlab-com/gl-infra/tamland/-/commit/0b399381d30655059ec78461640674af7562c708#587d266bb27a4dc3022bbed44dfa19849df3044c_116_125

suprememoocow avatar Jul 07 '21 13:07 suprememoocow

I'm having the same problem as well. But, in my case, it's a Java-based build and the Maven cache repo is being included as an ignore-path. The number of changes that should occur outside of that are fairly minimal, yet I'm easily seeing 5+ GB of RAM being used where the build before that was using at most 1.2GB. We'd love to be able to use smaller instances for our builds.

mikesir87 avatar Sep 07 '21 13:09 mikesir87

I rolled back to 1.3.0 from 1.6.0 and now it seems to work again

trallnag avatar Oct 13 '21 10:10 trallnag

This should be closed in the 1.7.0 release as of #1722.

Phylu avatar Oct 19 '21 10:10 Phylu

I rolled back to 1.3.0 from 1.6.0 and now it seems to work again

1.7 has a gcloud credentials problem, rolling back to 1.3.0 worked.

s3f4 avatar Oct 23 '21 21:10 s3f4

Do you know when the tag gcr.io/kaniko-project/executor:debug (as well as :latest) gets updated? It still points to the v1.6.0 version: https://console.cloud.google.com/gcr/images/kaniko-project/GLOBAL/executor

Exagone313 avatar Oct 29 '21 15:10 Exagone313

I was also experiencing a memory issues in the last part of the image building with v1.7.0.

INFO[0380] Taking snapshot of full filesystem...        
Killed

I tried all kinds of combinations with --compressed-caching=false and removing the --reproducible flag, downgrading to v1.3.0 and stuff. I finally got the build to pass by using the --use-new-run flag.

--use-new-run

Use the experimental run implementation for detecting changes without requiring file system snapshots. In some cases, this may improve build performance by 75%.

So I guess you should put that into your toolbox while banging your head against the wall :)

Zachu avatar Jan 20 '22 08:01 Zachu

Also got this issue when building with v1.9.1.

INFO[0133] Taking snapshot of full filesystem...        
Killed

reverted back to v1.3.0 and it works.

Idok-viber avatar Oct 13 '22 09:10 Idok-viber

I am using 1.9.0 and it seems to eat quite some memory. With or without --compressed-caching=false, --use-new-run same issue sporadically "The node was low on resource: memory. Container build was using 5384444Ki, which exceeds its request of 0. Container helper was using 24720Ki, which exceeds its request of 0. "The node was low on resource: memory. Container helper was using 9704Ki, which exceeds its request of 0. Container build was using 6871272Ki, which exceeds its request of 0."

7 GB to build an simple image ? The memory consumption is ridiculous. Why does the same with standard docker just works with 1x40 x less memory request?

cforce avatar Nov 07 '22 09:11 cforce

Reiterating what I stated in https://github.com/GoogleContainerTools/kaniko/issues/2275 as well:

We're having this issue as well with 1.9.1-debug. End size of the image should be ~9GB, but the kaniko build (on GKE) fails due to limit in memory. See attached image to share in my agony. image (5)

gaatjeniksaan avatar Mar 23 '23 12:03 gaatjeniksaan

Had this issue with kaniko v1.8.0-debug, also tried v1.3.0-debug, same issue. killed or evicted pod due to memory pressure on the (previously idle) node. This was the case when building an image nearly 2.5GB large, with the --cache=true flag.

Solution for me was to use v1.9.2-debug with the following options: --cache=true --compressed-caching=false --use-new-run --cleanup

Further advise (from research of other previous issues): DO NOT use the flags --single-snapshot or --cache-copy-layers

tamer-hassan avatar Apr 03 '23 02:04 tamer-hassan

I've got the same issue. For my case, I'm using a git context and cloning it itself takes 10Gi+ and gets killed before initiating the build on the latest versions. I tried with a node with more than 16Gi and it worked 1 out of 3 times.

codezart avatar May 03 '23 20:05 codezart

Kanicko feel like dead, i propose to switch to podman

cforce avatar May 06 '23 06:05 cforce

We have the same problem, get this in gitlab ci:

INFO[0172] Taking snapshot of full filesystem...        
Killed
Cleaning up project directory and file based variables
ERROR: Job failed: command terminated with exit code 137

jonaskello avatar Jun 08 '23 19:06 jonaskello

Solution for me was to use v1.9.2-debug with the following options: --cache=true --compressed-caching=false --use-new-run --cleanup

This worked for me, Thank you very much,

FYI For anyone else running into this.

starkmatt avatar Jun 13 '23 07:06 starkmatt

I have the same problem in v1.12.1-debug

INFO[0206] Taking snapshot of full filesystem...        
Killed

zzzinho avatar Jul 10 '23 09:07 zzzinho

Hello everyone just to give my input, here are some CPU/RAM metrics with different kaniko versions.

Just to clarify the container where the build runs is using github actions hosted runners with 2core and 4GB RAM

Picture 1 - kaniko 1.9.2-debug with cache enabled --> Push failed with message Killed image

Picture 2 - kaniko 1.9.2-debug with cache enabled and these settings --compressed-caching=false --use-new-run --cleanup Push failed with message Killed image

Picture 3 - kaniko 1.12.1-debug with cache enabled and these settings --compressed-caching=false --use-new-run --cleanup Push failed with message Killed image

Picture 4 - kaniko 1.3.0-debug with cache enabled (the flag --compressed-caching is not supported in this version) --> Push WORKS image

The resulting image has around 500MB and the container uses around 1 core and less than the memory limit of the container (4GB). This build works if we increase the memory limit to 16GB that is an overkill and a waste of resources. The jobs that are killed are in fact using almost half of the memory (~2GB) of the job that was successful (3GB)

I would say that something broke kaniko starting on version 1.3.0, but even with all the flags set the builds do not work and the memory usage is way less than with v1.3.0 (Update the builds started to fail from version v1.9.1)

Thanks for your help

UPDATE Tested also other older kaniko versions. with kaniko 1.5.2-debug with cache enabled image

with kaniko 1.5.2-debug with cache enabled image

with kaniko 1.6.0-debug with cache enabled image

with kaniko 1.8.1-debug with cache enabled image

with kaniko 1.9.0-debug with cache enabled image

Starting with kaniko v1.9.1 the builds started to fail image

ricardojdsilva87 avatar Jul 12 '23 16:07 ricardojdsilva87