swagger-codegen icon indicating copy to clipboard operation
swagger-codegen copied to clipboard

General dockerfile cleanup

Open jebentier opened this issue 8 years ago • 8 comments

PR checklist

  • [ ] Read the contribution guidelines.
  • [ ] Ran the shell script under ./bin/ to update Petstore sample so that CIs can verify the change. (For instance, only need to run ./bin/{LANG}-petstore.sh and ./bin/security/{LANG}-petstore.sh if updating the {LANG} (e.g. php, ruby, python, etc) code generator or {LANG} client's mustache templates). Windows batch files can be found in .\bin\windows\.
  • [ ] Filed the PR against the correct branch: 3.0.0 branch for changes related to OpenAPI spec 3.0. Default: master.
  • [ ] Copied the technical committee to review the pull request if your PR is targeting a particular programming langauge.

Description of the PR

I've taken a little time to switch unnecessary add statements to copy and re-order layers for build the docker images to decrease build time.

fix #6835

jebentier avatar Oct 27 '17 17:10 jebentier

Thanks for this. I have some questions/comments, and tests listed below.

I'm curious, how is moving COPY to the top of the file supposed to decrease build times? Are you referring to CI build or local build? I ask because I have had different experiences with this, and was wondering if you could point me to where this is documented.

This PR will change the hashes of the layers, causing newer versions of the image to be unable to reuse existing layers. This is a minor issue, considering the Dockerfile in master is stuctured such that all the changing layers were near the end anyway. Moving to the top would cause newer builds to not reuse any existing layers if docker-entrypoint.sh were to change. Granted, this is really only ~4MB for the bash executable, but this could easily change in the future and there's no real clear reason to break that reusability here.

Related to build times, here is the difference between master and this branch (both with the base image cached)…

Example, from master:

$ time docker build -t swagger-codegen-original .
…
Successfully built 041d13e21035
Successfully tagged swagger-codegen-original:latest

real	2m28.225s
user	0m7.595s
sys	0m2.552s

From this branch:

$ time docker build -t swagger-codegen-6836 .
Successfully built 5bc9dddda13b
Successfully tagged swagger-codegen-6836:latest

real	3m13.839s
user	0m7.641s
sys	0m2.530s

Looking into the effect on layer reuse, here's master's layers:

$ docker history swagger-codegen-original
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
041d13e21035        4 minutes ago       /bin/sh -c #(nop)  CMD ["build"]                0B
259eeae22c8b        5 minutes ago       /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...   0B
255ae05e7d2f        5 minutes ago       /bin/sh -c #(nop) COPY file:7cf9103f5df247...   899B
fab4e913cb31        5 minutes ago       /bin/sh -c mvn -am -pl "modules/swagger-co...   124MB
b1a0d54ef80c        7 minutes ago       /bin/sh -c #(nop) WORKDIR /opt/swagger-cod...   0B
7c48ac7d4676        7 minutes ago       /bin/sh -c #(nop)  VOLUME [/usr/share/mave...   0B
2d2eed025482        7 minutes ago       /bin/sh -c #(nop) ADD dir:7ff018fb28bef47c...   179MB
5a91cb34c0dc        7 minutes ago       /bin/sh -c mkdir /opt                           0B
5a1d5edb14ab        7 minutes ago       /bin/sh -c set -x &&     apk add --no-cach...   3.57MB
508f201e351d        7 minutes ago       /bin/sh -c #(nop)  ENV GEN_DIR=/opt/swagge...   0B
240eccea6c3b        17 months ago       /bin/sh -c cd /tmp    && wget https://arch...   10MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar...   0B
<missing>           17 months ago       /bin/sh -c #(nop) MAINTAINER Jim Schubert ...   0B
<missing>           17 months ago       /bin/sh -c set -x  && apk add --no-cache  ...   141MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=...   0B
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_VERSION=8u92         0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV PATH=/usr/local/sbin...   0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j...   0B
<missing>           18 months ago       /bin/sh -c {   echo '#!/bin/sh';   echo 's...   87B
<missing>           18 months ago       /bin/sh -c #(nop) ENV LANG=C.UTF-8              0B
<missing>           18 months ago       /bin/sh -c #(nop) ADD file:614a9122187935f...   4.8MB

And this branch's layers:

$ docker history swagger-codegen-6836
IMAGE               CREATED              CREATED BY                                      SIZE                COMMENT
5bc9dddda13b        About a minute ago   /bin/sh -c #(nop)  CMD ["build"]                0B
5afcb4ad8496        About a minute ago   /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...   0B
2f187693b82e        About a minute ago   /bin/sh -c mvn -am -pl "modules/swagger-co...   124MB
37c08be2abf5        4 minutes ago        /bin/sh -c #(nop) WORKDIR /opt/swagger-cod...   0B
525c50f9a2ce        4 minutes ago        /bin/sh -c #(nop)  VOLUME [/usr/share/mave...   0B
82afdd9f5355        4 minutes ago        /bin/sh -c #(nop) COPY dir:7bdf3d90c060b31...   179MB
70c1e8c93352        4 minutes ago        /bin/sh -c mkdir /opt                           0B
8769830bc22a        4 minutes ago        /bin/sh -c set -x &&     apk add --no-cach...   3.57MB
3f2c43d2df62        4 minutes ago        /bin/sh -c #(nop)  ENV GEN_DIR=/opt/swagge...   0B
96500aef99a9        4 minutes ago        /bin/sh -c #(nop) COPY file:9a88bfd5c3619d...   933B
240eccea6c3b        17 months ago        /bin/sh -c cd /tmp    && wget https://arch...   10MB
<missing>           17 months ago        /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar...   0B
<missing>           17 months ago        /bin/sh -c #(nop) MAINTAINER Jim Schubert ...   0B
<missing>           17 months ago        /bin/sh -c set -x  && apk add --no-cache  ...   141MB
<missing>           17 months ago        /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=...   0B
<missing>           17 months ago        /bin/sh -c #(nop) ENV JAVA_VERSION=8u92         0B
<missing>           18 months ago        /bin/sh -c #(nop) ENV PATH=/usr/local/sbin...   0B
<missing>           18 months ago        /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j...   0B
<missing>           18 months ago        /bin/sh -c {   echo '#!/bin/sh';   echo 's...   87B
<missing>           18 months ago        /bin/sh -c #(nop) ENV LANG=C.UTF-8              0B
<missing>           18 months ago        /bin/sh -c #(nop) ADD file:614a9122187935f...   4.8MB

As you can see, no layer can be reused from previous builds.

And regarding my point about moving it to the top of the file and changing all subsequent layers… Here's the layer structure if we were to make a minor change to docker-entrypoint.sh:

$ docker history swagger-codegen-6836-2
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
d7d53f44cd45        12 seconds ago      /bin/sh -c #(nop)  CMD ["build"]                0B
ecc0a6ab2431        13 seconds ago      /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...   0B
feb4ebcae626        16 seconds ago      /bin/sh -c mvn -am -pl "modules/swagger-co...   124MB
14628c4f3c73        2 minutes ago       /bin/sh -c #(nop) WORKDIR /opt/swagger-cod...   0B
61c41982c7aa        2 minutes ago       /bin/sh -c #(nop)  VOLUME [/usr/share/mave...   0B
471d277a44fb        2 minutes ago       /bin/sh -c #(nop) COPY dir:d06f2f65f5588b7...   179MB
871db10b8df3        2 minutes ago       /bin/sh -c mkdir /opt                           0B
7a876cd9d009        2 minutes ago       /bin/sh -c set -x &&     apk add --no-cach...   3.57MB
e2bdc632ed8d        3 minutes ago       /bin/sh -c #(nop)  ENV GEN_DIR=/opt/swagge...   0B
7566c2d2fd56        3 minutes ago       /bin/sh -c #(nop) COPY file:093478c98cf407...   942B
240eccea6c3b        17 months ago       /bin/sh -c cd /tmp    && wget https://arch...   10MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar...   0B
<missing>           17 months ago       /bin/sh -c #(nop) MAINTAINER Jim Schubert ...   0B
<missing>           17 months ago       /bin/sh -c set -x  && apk add --no-cache  ...   141MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=...   0B
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_VERSION=8u92         0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV PATH=/usr/local/sbin...   0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j...   0B
<missing>           18 months ago       /bin/sh -c {   echo '#!/bin/sh';   echo 's...   87B
<missing>           18 months ago       /bin/sh -c #(nop) ENV LANG=C.UTF-8              0B
<missing>           18 months ago       /bin/sh -c #(nop) ADD file:614a9122187935f...   4.8MB

And its build time:

real	3m19.987s
user	0m7.934s
sys	0m3.006s

If I switch back to master and build with a minor change to docker-entrypoint.sh, you'll see that I'm able to reuse everything up to where we begin adding the source. This is because the root directory changes when we modify docker-entrypoint.sh, so there's not really a good, clean way around this and there's very little gain considering the repository's structure is intended to change on every commit.

$ docker history swagger-codegen-original-2
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
d730383c32b0        17 seconds ago      /bin/sh -c #(nop)  CMD ["build"]                0B
83c12feccdd2        18 seconds ago      /bin/sh -c #(nop)  ENTRYPOINT ["docker-ent...   0B
11c86ae38ca8        19 seconds ago      /bin/sh -c #(nop) COPY file:b8473adcdfe9fb...   908B
a6f0a1039a63        22 seconds ago      /bin/sh -c mvn -am -pl "modules/swagger-co...   124MB
e00dbb9dce4c        2 minutes ago       /bin/sh -c #(nop) WORKDIR /opt/swagger-cod...   0B
0af2f20a4658        2 minutes ago       /bin/sh -c #(nop)  VOLUME [/usr/share/mave...   0B
ca8ef927a400        2 minutes ago       /bin/sh -c #(nop) ADD dir:77209b6b57d28bb4...   179MB
5a91cb34c0dc        32 minutes ago      /bin/sh -c mkdir /opt                           0B
5a1d5edb14ab        32 minutes ago      /bin/sh -c set -x &&     apk add --no-cach...   3.57MB
508f201e351d        32 minutes ago      /bin/sh -c #(nop)  ENV GEN_DIR=/opt/swagge...   0B
240eccea6c3b        17 months ago       /bin/sh -c cd /tmp    && wget https://arch...   10MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV MAVEN_HOME=/usr/shar...   0B
<missing>           17 months ago       /bin/sh -c #(nop) MAINTAINER Jim Schubert ...   0B
<missing>           17 months ago       /bin/sh -c set -x  && apk add --no-cache  ...   141MB
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_ALPINE_VERSION=...   0B
<missing>           17 months ago       /bin/sh -c #(nop) ENV JAVA_VERSION=8u92         0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV PATH=/usr/local/sbin...   0B
<missing>           18 months ago       /bin/sh -c #(nop) ENV JAVA_HOME=/usr/lib/j...   0B
<missing>           18 months ago       /bin/sh -c {   echo '#!/bin/sh';   echo 's...   87B
<missing>           18 months ago       /bin/sh -c #(nop) ENV LANG=C.UTF-8              0B
<missing>           18 months ago       /bin/sh -c #(nop) ADD file:614a9122187935f...   4.8MB

All that said, there are some modifications that could be made to improve layer reusability.

For instance, because VOLUME and WORKDIR are 0B layers, it doesn't really matter if we add the code before or after these commands. We could move the command after and change it to COPY as you've done here.

We could also increase layer reuse by adding each subdirectory individually in increasing order of popularity or likelihood of modification. For instance, here's an example Dockerfile for what I'm suggesting:

FROM jimschubert/8-jdk-alpine-mvn:1.0

ENV GEN_DIR /opt/swagger-codegen

RUN set -x && \
    apk add --no-cache bash

RUN mkdir /opt && mkdir -p ${GEN_DIR}/modules

VOLUME  ${MAVEN_HOME}/.m2/repository

WORKDIR ${GEN_DIR}

COPY docker-entrypoint.sh /usr/local/bin/
COPY ./modules/swagger-codegen-maven-plugin ${GEN_DIR}/modules/swagger-codegen-maven-plugin
COPY ./modules/swagger-generator ${GEN_DIR}/modules/swagger-generator
COPY ./modules/swagger-codegen-cli ${GEN_DIR}/modules/swagger-codegen-cli
COPY ./modules/swagger-codegen ${GEN_DIR}/modules/swagger-codegen

# THIS IS THE DIFFICULT PART, Dockerd doesn't support excludes to copy command
COPY [a-zA-Z\\.]* ${GEN_DIR}/

RUN mvn -am -pl "modules/swagger-codegen-cli" package

ENTRYPOINT ["docker-entrypoint.sh"]

CMD ["build"]

The problem with this approach is that it could easily go unnoticed if a new module is added to the project and not copied into the image. This is because the image only builds and tests swagger-codegen-cli (3 modules), but is intended to be a "full" swagger codegen container. It would be surprising after 3 release or so for anyone to assume swagger-codegen-new-module is in the versioned container in which the module was added, only to find the Dockerfile update was missed.

Not only this, but Dockerfile doesn't currently support exclusions in COPY. There may be a way to work out a pattern that excludes only those manually added modules directories, but again I think that's a lot more effort than a maintainable solution.

Any thoughts or comments on any of that? Sorry, I know it was a lot… but I've put a lot of thought into this Dockerfile to reduce it from the previous image (which I think was about 2GB).

jimschubert avatar Oct 28 '17 16:10 jimschubert

The exclude should be handled by using .dockerignore file https://docs.docker.com/engine/reference/builder/#dockerignore-file

It reduces extra files being copied into the image, and it reduces the size of the context sent to dockerd which can be very helpful if you have a large number of files, and only a small number should be included in the image.

kenjones-cisco avatar Nov 01 '17 18:11 kenjones-cisco

@kenjones-cisco The exclude I mentioned is conditional based on previous COPY instructions. The ignore file prevents these from being sent to the build context, which won't work.

jimschubert avatar Nov 01 '17 20:11 jimschubert

I had commented that the Dockerfile in this file is built on every commit. That doesn't seem to be the case (I thought it was previously as a tracking master build). Since this is only used by engineers who want to build without modification to their system (e.g. if they're stuck on Java 6 or have conflicting maven issues), we can make more assumptions and concessions in this image.

Also, it's not clear to me why run-in-docker.sh doesn't use this image.

jimschubert avatar Nov 02 '17 13:11 jimschubert

The image on master is rebuilt on every commit to master, https://github.com/swagger-api/swagger-codegen/blob/master/circle.yml#L46

kenjones-cisco avatar Nov 02 '17 13:11 kenjones-cisco

The reason it is likely not used as part of run-in-docker.sh is because if you are developing a change, then you need to rebuild the modules, so if you using an image with the modules already included then it is not going to provide the expected results.

This image is used for those actually generate code for their projects.

kenjones-cisco avatar Nov 02 '17 13:11 kenjones-cisco

The GEN_DIR and its contents are remapped in run-in-docker.sh. I guess because it also maps the Maven directory to the local file system, the container provides no additional benefit. Seems like that would cause issues, especially if that script runs in the container as root, it'll cause permissions issues in the host Maven cache.

jimschubert avatar Nov 02 '17 13:11 jimschubert

The permissions are different for sure, but as the files are maven cache which are only used within the container, that is not usually an issue. And the generated files coming from actions run via the container are part of .gitignore they never make it into the repository.

For this project I actually have a custom run-in-docker.sh file because I'm on windows and therefore need cygwin, I have to format pwd correctly before it will work.

All my development for all projects I work on are done similarly to run-in-docker.sh except I leverage a a Dockerfile.dev to represent the projects development environment and a Makefile to generate the different docker run commands. It provides a constant development environment across all OSes and avoids me having to install everything local to my laptop, and then allows me to easily reuse the same for doing builds on a build server with no other dependency besides Docker. As such I make sure the Makefile has a clean function to remove files created within the container but are local due to the volume mounts. And for files I plan to add to the git repository, I have a simple function added to my .bashrc file that will change the ownership of the files and the permissions.

kenjones-cisco avatar Nov 02 '17 14:11 kenjones-cisco