General dockerfile cleanup
PR checklist
- [ ] Read the contribution guidelines.
- [ ] Ran the shell script under
./bin/to update Petstore sample so that CIs can verify the change. (For instance, only need to run./bin/{LANG}-petstore.shand./bin/security/{LANG}-petstore.shif updating the {LANG} (e.g. php, ruby, python, etc) code generator or {LANG} client's mustache templates). Windows batch files can be found in.\bin\windows\. - [ ] Filed the PR against the correct branch:
3.0.0branch for changes related to OpenAPI spec 3.0. Default:master. - [ ] Copied the technical committee to review the pull request if your PR is targeting a particular programming langauge.
Description of the PR
I've taken a little time to switch unnecessary add statements to copy and re-order layers for build the docker images to decrease build time.
fix #6835
Thanks for this. I have some questions/comments, and tests listed below.
I'm curious, how is moving COPY to the top of the file supposed to decrease build times? Are you referring to CI build or local build? I ask because I have had different experiences with this, and was wondering if you could point me to where this is documented.
This PR will change the hashes of the layers, causing newer versions of the image to be unable to reuse existing layers. This is a minor issue, considering the Dockerfile in master is stuctured such that all the changing layers were near the end anyway. Moving to the top would cause newer builds to not reuse any existing layers if docker-entrypoint.sh were to change. Granted, this is really only ~4MB for the bash executable, but this could easily change in the future and there's no real clear reason to break that reusability here.
Related to build times, here is the difference between master and this branch (both with the base image cached)…
Example, from master:
$ time docker build -t swagger-codegen-original .
…
Successfully built 041d13e21035
Successfully tagged swagger-codegen-original:latest
real 2m28.225s
user 0m7.595s
sys 0m2.552s
From this branch:
$ time docker build -t swagger-codegen-6836 .
Successfully built 5bc9dddda13b
Successfully tagged swagger-codegen-6836:latest
real 3m13.839s
user 0m7.641s
sys 0m2.530s
Looking into the effect on layer reuse, here's master's layers:
$ docker history swagger-codegen-original
IMAGE CREATED CREATED BY SIZE COMMENT
041d13e21035 4 minutes ago /bin/sh -c #(nop) CMD ["build"] 0B
259eeae22c8b 5 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["docker-ent... 0B
255ae05e7d2f 5 minutes ago /bin/sh -c #(nop) COPY file:7cf9103f5df247... 899B
fab4e913cb31 5 minutes ago /bin/sh -c mvn -am -pl "modules/swagger-co... 124MB
b1a0d54ef80c 7 minutes ago /bin/sh -c #(nop) WORKDIR /opt/swagger-cod... 0B
7c48ac7d4676 7 minutes ago /bin/sh -c #(nop) VOLUME [/usr/share/mave... 0B
2d2eed025482 7 minutes ago /bin/sh -c #(nop) ADD dir:7ff018fb28bef47c... 179MB
5a91cb34c0dc 7 minutes ago /bin/sh -c mkdir /opt 0B
5a1d5edb14ab 7 minutes ago /bin/sh -c set -x && apk add --no-cach... 3.57MB
508f201e351d 7 minutes ago /bin/sh -c #(nop) ENV GEN_DIR=/opt/swagge... 0B
240eccea6c3b 17 months ago /bin/sh -c cd /tmp && wget https://arch... 10MB
<missing> 17 months ago /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar... 0B
<missing> 17 months ago /bin/sh -c #(nop) MAINTAINER Jim Schubert ... 0B
<missing> 17 months ago /bin/sh -c set -x && apk add --no-cache ... 141MB
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=... 0B
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_VERSION=8u92 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/sbin... 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j... 0B
<missing> 18 months ago /bin/sh -c { echo '#!/bin/sh'; echo 's... 87B
<missing> 18 months ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 18 months ago /bin/sh -c #(nop) ADD file:614a9122187935f... 4.8MB
And this branch's layers:
$ docker history swagger-codegen-6836
IMAGE CREATED CREATED BY SIZE COMMENT
5bc9dddda13b About a minute ago /bin/sh -c #(nop) CMD ["build"] 0B
5afcb4ad8496 About a minute ago /bin/sh -c #(nop) ENTRYPOINT ["docker-ent... 0B
2f187693b82e About a minute ago /bin/sh -c mvn -am -pl "modules/swagger-co... 124MB
37c08be2abf5 4 minutes ago /bin/sh -c #(nop) WORKDIR /opt/swagger-cod... 0B
525c50f9a2ce 4 minutes ago /bin/sh -c #(nop) VOLUME [/usr/share/mave... 0B
82afdd9f5355 4 minutes ago /bin/sh -c #(nop) COPY dir:7bdf3d90c060b31... 179MB
70c1e8c93352 4 minutes ago /bin/sh -c mkdir /opt 0B
8769830bc22a 4 minutes ago /bin/sh -c set -x && apk add --no-cach... 3.57MB
3f2c43d2df62 4 minutes ago /bin/sh -c #(nop) ENV GEN_DIR=/opt/swagge... 0B
96500aef99a9 4 minutes ago /bin/sh -c #(nop) COPY file:9a88bfd5c3619d... 933B
240eccea6c3b 17 months ago /bin/sh -c cd /tmp && wget https://arch... 10MB
<missing> 17 months ago /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar... 0B
<missing> 17 months ago /bin/sh -c #(nop) MAINTAINER Jim Schubert ... 0B
<missing> 17 months ago /bin/sh -c set -x && apk add --no-cache ... 141MB
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=... 0B
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_VERSION=8u92 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/sbin... 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j... 0B
<missing> 18 months ago /bin/sh -c { echo '#!/bin/sh'; echo 's... 87B
<missing> 18 months ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 18 months ago /bin/sh -c #(nop) ADD file:614a9122187935f... 4.8MB
As you can see, no layer can be reused from previous builds.
And regarding my point about moving it to the top of the file and changing all subsequent layers…
Here's the layer structure if we were to make a minor change to docker-entrypoint.sh:
$ docker history swagger-codegen-6836-2
IMAGE CREATED CREATED BY SIZE COMMENT
d7d53f44cd45 12 seconds ago /bin/sh -c #(nop) CMD ["build"] 0B
ecc0a6ab2431 13 seconds ago /bin/sh -c #(nop) ENTRYPOINT ["docker-ent... 0B
feb4ebcae626 16 seconds ago /bin/sh -c mvn -am -pl "modules/swagger-co... 124MB
14628c4f3c73 2 minutes ago /bin/sh -c #(nop) WORKDIR /opt/swagger-cod... 0B
61c41982c7aa 2 minutes ago /bin/sh -c #(nop) VOLUME [/usr/share/mave... 0B
471d277a44fb 2 minutes ago /bin/sh -c #(nop) COPY dir:d06f2f65f5588b7... 179MB
871db10b8df3 2 minutes ago /bin/sh -c mkdir /opt 0B
7a876cd9d009 2 minutes ago /bin/sh -c set -x && apk add --no-cach... 3.57MB
e2bdc632ed8d 3 minutes ago /bin/sh -c #(nop) ENV GEN_DIR=/opt/swagge... 0B
7566c2d2fd56 3 minutes ago /bin/sh -c #(nop) COPY file:093478c98cf407... 942B
240eccea6c3b 17 months ago /bin/sh -c cd /tmp && wget https://arch... 10MB
<missing> 17 months ago /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar... 0B
<missing> 17 months ago /bin/sh -c #(nop) MAINTAINER Jim Schubert ... 0B
<missing> 17 months ago /bin/sh -c set -x && apk add --no-cache ... 141MB
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=... 0B
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_VERSION=8u92 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/sbin... 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j... 0B
<missing> 18 months ago /bin/sh -c { echo '#!/bin/sh'; echo 's... 87B
<missing> 18 months ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 18 months ago /bin/sh -c #(nop) ADD file:614a9122187935f... 4.8MB
And its build time:
real 3m19.987s
user 0m7.934s
sys 0m3.006s
If I switch back to master and build with a minor change to docker-entrypoint.sh, you'll see that I'm able to reuse everything up to where we begin adding the source. This is because the root directory changes when we modify docker-entrypoint.sh, so there's not really a good, clean way around this and there's very little gain considering the repository's structure is intended to change on every commit.
$ docker history swagger-codegen-original-2
IMAGE CREATED CREATED BY SIZE COMMENT
d730383c32b0 17 seconds ago /bin/sh -c #(nop) CMD ["build"] 0B
83c12feccdd2 18 seconds ago /bin/sh -c #(nop) ENTRYPOINT ["docker-ent... 0B
11c86ae38ca8 19 seconds ago /bin/sh -c #(nop) COPY file:b8473adcdfe9fb... 908B
a6f0a1039a63 22 seconds ago /bin/sh -c mvn -am -pl "modules/swagger-co... 124MB
e00dbb9dce4c 2 minutes ago /bin/sh -c #(nop) WORKDIR /opt/swagger-cod... 0B
0af2f20a4658 2 minutes ago /bin/sh -c #(nop) VOLUME [/usr/share/mave... 0B
ca8ef927a400 2 minutes ago /bin/sh -c #(nop) ADD dir:77209b6b57d28bb4... 179MB
5a91cb34c0dc 32 minutes ago /bin/sh -c mkdir /opt 0B
5a1d5edb14ab 32 minutes ago /bin/sh -c set -x && apk add --no-cach... 3.57MB
508f201e351d 32 minutes ago /bin/sh -c #(nop) ENV GEN_DIR=/opt/swagge... 0B
240eccea6c3b 17 months ago /bin/sh -c cd /tmp && wget https://arch... 10MB
<missing> 17 months ago /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar... 0B
<missing> 17 months ago /bin/sh -c #(nop) MAINTAINER Jim Schubert ... 0B
<missing> 17 months ago /bin/sh -c set -x && apk add --no-cache ... 141MB
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=... 0B
<missing> 17 months ago /bin/sh -c #(nop) ENV JAVA_VERSION=8u92 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/sbin... 0B
<missing> 18 months ago /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j... 0B
<missing> 18 months ago /bin/sh -c { echo '#!/bin/sh'; echo 's... 87B
<missing> 18 months ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 18 months ago /bin/sh -c #(nop) ADD file:614a9122187935f... 4.8MB
All that said, there are some modifications that could be made to improve layer reusability.
For instance, because VOLUME and WORKDIR are 0B layers, it doesn't really matter if we add the code before or after these commands. We could move the command after and change it to COPY as you've done here.
We could also increase layer reuse by adding each subdirectory individually in increasing order of popularity or likelihood of modification. For instance, here's an example Dockerfile for what I'm suggesting:
FROM jimschubert/8-jdk-alpine-mvn:1.0
ENV GEN_DIR /opt/swagger-codegen
RUN set -x && \
apk add --no-cache bash
RUN mkdir /opt && mkdir -p ${GEN_DIR}/modules
VOLUME ${MAVEN_HOME}/.m2/repository
WORKDIR ${GEN_DIR}
COPY docker-entrypoint.sh /usr/local/bin/
COPY ./modules/swagger-codegen-maven-plugin ${GEN_DIR}/modules/swagger-codegen-maven-plugin
COPY ./modules/swagger-generator ${GEN_DIR}/modules/swagger-generator
COPY ./modules/swagger-codegen-cli ${GEN_DIR}/modules/swagger-codegen-cli
COPY ./modules/swagger-codegen ${GEN_DIR}/modules/swagger-codegen
# THIS IS THE DIFFICULT PART, Dockerd doesn't support excludes to copy command
COPY [a-zA-Z\\.]* ${GEN_DIR}/
RUN mvn -am -pl "modules/swagger-codegen-cli" package
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["build"]
The problem with this approach is that it could easily go unnoticed if a new module is added to the project and not copied into the image. This is because the image only builds and tests swagger-codegen-cli (3 modules), but is intended to be a "full" swagger codegen container. It would be surprising after 3 release or so for anyone to assume swagger-codegen-new-module is in the versioned container in which the module was added, only to find the Dockerfile update was missed.
Not only this, but Dockerfile doesn't currently support exclusions in COPY. There may be a way to work out a pattern that excludes only those manually added modules directories, but again I think that's a lot more effort than a maintainable solution.
Any thoughts or comments on any of that? Sorry, I know it was a lot… but I've put a lot of thought into this Dockerfile to reduce it from the previous image (which I think was about 2GB).
The exclude should be handled by using .dockerignore file
https://docs.docker.com/engine/reference/builder/#dockerignore-file
It reduces extra files being copied into the image, and it reduces the size of the context sent to dockerd which can be very helpful if you have a large number of files, and only a small number should be included in the image.
@kenjones-cisco The exclude I mentioned is conditional based on previous COPY instructions. The ignore file prevents these from being sent to the build context, which won't work.
I had commented that the Dockerfile in this file is built on every commit. That doesn't seem to be the case (I thought it was previously as a tracking master build). Since this is only used by engineers who want to build without modification to their system (e.g. if they're stuck on Java 6 or have conflicting maven issues), we can make more assumptions and concessions in this image.
Also, it's not clear to me why run-in-docker.sh doesn't use this image.
The image on master is rebuilt on every commit to master, https://github.com/swagger-api/swagger-codegen/blob/master/circle.yml#L46
The reason it is likely not used as part of run-in-docker.sh is because if you are developing a change, then you need to rebuild the modules, so if you using an image with the modules already included then it is not going to provide the expected results.
This image is used for those actually generate code for their projects.
The GEN_DIR and its contents are remapped in run-in-docker.sh. I guess because it also maps the Maven directory to the local file system, the container provides no additional benefit. Seems like that would cause issues, especially if that script runs in the container as root, it'll cause permissions issues in the host Maven cache.
The permissions are different for sure, but as the files are maven cache which are only used within the container, that is not usually an issue. And the generated files coming from actions run via the container are part of .gitignore they never make it into the repository.
For this project I actually have a custom run-in-docker.sh file because I'm on windows and therefore need cygwin, I have to format pwd correctly before it will work.
All my development for all projects I work on are done similarly to run-in-docker.sh except I leverage a a Dockerfile.dev to represent the projects development environment and a Makefile to generate the different docker run commands. It provides a constant development environment across all OSes and avoids me having to install everything local to my laptop, and then allows me to easily reuse the same for doing builds on a build server with no other dependency besides Docker.
As such I make sure the Makefile has a clean function to remove files created within the container but are local due to the volume mounts. And for files I plan to add to the git repository, I have a simple function added to my .bashrc file that will change the ownership of the files and the permissions.