Use of ephemeral-storage is problematic
Actual behavior
Overview
A clear and concise description of what the bug is.
Our Kubernetes cluster has limited ephemeral storage at its disposal and it is to our benefit to reduce the
need for that. Building with Kaniko combined with Jenkins is problematic as (it appears) the result of
various Dockerfile RUN commands execute in the context of the root filesystem of the kaniko container.
Thus, the default ephemeral-storage limit is exceeded even with something as simple as apt-get update.
Only by increasing resources->limit->ephemeral-storage do builds it become possible, but this runs the
risk of exceeding the ephemeral storage quota, or having jobs suspended until there are sufficient
resources available.
One approach to avoid this problem is simply map either a PVC or generic ephemeral volume, to the
part of the filesystem that is accumulating data thus avoiding having to use ephemeral storage, however
this is met with a build failure (described below) Additionally, the facility for switching the /kaniko directory is also problematic for other reasons. No clear fix exists though there may indeed be one
for someone with more thorough knowledge, thus this issue.
Mapping of KANIKO_DIR
There are several issues related to this (e.g. #2067). But even with the latest fixes in place, this
fails due to the common practice of providing login credentials when executing in Kubernetes (in combination
with Jenkins). Specifically, the directory KANIKO_DIR/.docker comes from a configmap, and is thus
a read-only file system, it can not be deleted. But the action of kaniko when a non-default KANIKO_DIR
is specified is to always delete the previous (default) one which encounters "readonly filesystem" error
when trying to unlink it. Conversely, if instead, the mapping is to the destination directory, another
the readonly filesystem error in responding to an unavailable chmod. There is, however, a workaround
for this by avoiding using the feature altogether in combination with an init container, for example,
see build-init.
Example mapping
The volume definition
volumes:
- name: jenkins-cfg
projected:
sources:
- secret:
name: rencibuild-imagepull-secret
items:
- key: .dockerconfigjson
path: config.json
And for the kaniko executor container
volumeMounts:
- name: jenkins-cfg
mountPath: /kaniko/.docker
The kaniko code that raises the error
As can be seen below, there is no way around this directly. A nice addition would be an option to not attempt the deletion.
The problematic section of kaniko src is here
This line fails trying to delete /kaniko/.docker
if err := os.RemoveAll(constants.DefaultKanikoPath); err != nil {
return err
}
or alternately, the chmod fails if the mapping is in the destination
if _, err := util.CopyDir(constants.DefaultKanikoPath, dir, util.FileContext{}, util.DoNotChangeUID, util.DoNotChangeGID); err != nil {
return err
}
The unavoidable use of /usr and /var
Typically, Dockerfiles install new software including apt-get update which principally alter
the contents of /var and /usr. Unfortunately, these files add files, alter, and add /usr
and /var to the root filesystem of the kaniko container.
Normal processing
Prior to the build, these are unpopulated in the kaniko container
/ # du -s *
0 bin
1388 busybox
0 dev
16 etc
30139 home
50725 kaniko
0 proc
0 sys
0 usr
1 workspace
But as the build runs, they are populated
/ # du -s *
12 app
0 bin
0 boot
1388 busybox
0 dev
884 etc
34087 home
85763 kaniko
...
0 sys
4 tmp
479796 usr
162764 var
Because these 2 directories are in the root filesystem, eventually the pod exceeds its ephemeral-storage limit.
Trying to map /usr and /var leave them unpopulated
Perhaps there is some detection code but when usr and var are mapped similar to how
the /kaniko directory is mapped above. The build fails with the following message
error building image: error building stage: failed to execute command: starting command: fork/exec /bin/bash: no such file or directory
Expected behavior There should be a means to eliminate excessive use of ephemeral-storage To Reproduce Steps to reproduce the behavior:
Any build that produces a container larger than the ephemeral storage limit
Additional Information
Example Jenkinsfile providing the environment, and also the Dockerfile
- Kaniko Image (fully qualified with digest)
both the released 1.8.1 as well as kaniko built from master branch
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
We have the same problems with GKE Autopilot, which limits ephemeral storage to 10Gi.
Hi, thanks for your reply and investigation. Yesterday I also did some checks and found that using --kaniko-dir doesn't impact at all.
My test is creating cockroach image with database files inside just to up and run for e2e test purposes.
Dockerfile quite simple is just usage one FROM cockroachdb/cockroach:v23.1.13 + RUN findutils install + copying data files.
I have found that Kaniko create /cockroach folder and populates data there, which I believe it related to the image that we use from the FROM section. And then how you mentioned it populate /kaniko folder with temporary data.
I mounted my volume into /cockroach folder, so Kaniko stored image data inside the volume, I also used --kaniko-dir=/cockroach/kaniko option, but it did not help at all. I saw that it just copied executor, warmer and other files from /kaniko to /cockroach/kaniko and proceeded to use /kaniko for temporary numeric folders. Hopefully, mount /cockroach helps to not reach 10 Gi, but /kaniko is still populated data itself and it seems ignore --kaniko-dir option. With a more complicated Dockerfile it won't help.
On Tue, Jan 30, 2024 at 5:42 PM Aaron Prindle @.***> wrote:
Thanks for flagging this issue. Unfortunately given that GKE Autpilot has 10Gi max ephemeral storage limit and the way kaniko works, kaniko has some issues with base images >= 10Gi. Kaniko can get around issues with Kaniko itself filling up the storage though (via information it stores in /kaniko). I've added some information below:
I believe there is a method to overcome the Autopilot 10GB boot disk limit for some use cases - using the "--kaniko-dir" flag (/kaniko by default) we can change the directory to use to store kaniko's own runtime/temp files including tar layers, files across multi-stage docker builds, etc. and this can be set on top of an ephemeral volume of scalable size which should allow for a solution here IIUC. I believe the issue currently is that this /kaniko directory is growing past 10GB and it is currently on top of the 10GB disk. NOTE: This will not fix issues if the base image you are trying to use is >=10GB
In my own testing building images with kaniko via GKE Autopilot I am able to overcome filesystem limits using this approach (NOTE: https://github.com/ethereum/go-ethereum used as misc large dir w/ Dockerfile to hit FS limit)
@.*** ~/docker-mount kubectl get po -n wi NAME READY STATUS RESTARTS AGE kaniko-root-go-ethereum 0/1 Error 0 155m kaniko-root-go-ethereum-v2 0/1 Completed 0 54m
The first run (kaniko-root-go-ethereum) fails with a disk space issue (trying to repro what Rakuten is experiencing): $ kubectl get po -n wi kaniko-root-go-ethereum -o yaml
relevant snippet:
message: 'Pod ephemeral local storage usage exceeds the total limit of containers 1Gi. ' phase: Failed podIP: 10.8.0.67 podIPs:
- ip: 10.8.0.67 qosClass: Guaranteed reason: Evicted startTime: "2023-11-11T02:23:11Z"
Using the method of moving --kaniko-dir=/scratch and mounting ephemeral volume to /scratch full yaml - kaniko-root-go-ethereum.yaml: https://gist.github.com/aaron-prindle/abecd71ff3ecc30d85e68a42d754045b
results in complete build w/ no FS issue (kaniko-root-go-ethereum-v2)
NOTE: the above example uses --no-push and --skip-push-permissions so the image is built and not pushed but pushing images should work assuming the SA has GCR push auth
Using --kaniko-dir=/scratch in combination with an ephemeral volume at /scratch I believe is a possible from my testing.
— Reply to this email directly, view it on GitHub https://github.com/GoogleContainerTools/kaniko/issues/2219#issuecomment-1917274191, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3JP224J74HGWBEHRRR3QMTYREIEVAVCNFSM57HYFDV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRG4ZDONBRHEYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
Kind regards,
Artem Zherdiev
DevOps
@.*** @.***>
[image: logo] https://ingenious.build/