Package ClusterRunner using Docker container
Feature proposal for the following:
- What: Package ClusterRunner using Docker container
- Why: At my work, currently we strive to make all developer productivity tooling available to our developers via Docker container on CoreOS to prevent user space pollution, to delegate cross-platform portability to container engine, to avoid works-on-my-machine, etc. to name a few... :)
-
How: I'd like to send a pull request that will package
mkdir -p ~/.clusterrunner/dist && cd ~/.clusterrunner && curl -L https://cloud.box.com/shared/static/2pl4pi6ykvrbb9d06t4m.tgz > clusterrunner.tgz && tar -zxvf clusterrunner.tgz -C ./dist && cp ./dist/conf/default_clusterrunner.conf clusterrunner.conf && chmod 600 clusterrunner.confusing a minimal distro in a Docker container
Hi, @yamaszone. Thanks for bringing this up. We have actually been thinking about converting our distribution from cx_Freeze to a Dockerfile/container. It would be awesome to see a PR for this. :)
Instead of downloading the binary into the container, what do you think of just starting from a minimal distro with Python 3.4 and running ClusterRunner from source? In that situation, the clusterrunner executable is replaced with python main.py.
@josephharrington I was thinking the same but was also considering impact of binary vs. source on Docker image size. We may have two image flavors light (from binary) and full (full from source) if sizes are significantly different. I would prefer to maintain only one flavor with the image size to be as minimal as possible. I'd like to do some experiments on this and will share my findings. Do you have any specific distro preference?
I'd be surprised if the binary version was much different in size than Python+ClusterRunner source. I think the way that we create the binary (cx_Freeze) basically packages the Python interpreter and the source together. But you never know. :)
I don't think we have a specific distro preference. We use CentOS for some stuff internally, but I'm sure Ubuntu or something would work fine for this as well. @tjlee0909 @gcurtis @nadeemahmad please chime in if you guys have any opinions on this.
@josephharrington From my quick experimentation, it looks like we will have to maintain two flavors light and full because of the significant size difference in image size. Here's the results from two dirty Dockerfiles Dockerfile.bin vs. Dockerfile.src:
clusterrunner full 86c0d2128fca 2 minutes ago 403.9 MB
clusterrunner light 91532c1c1ac8 30 minutes ago 244.2 MB
From details on the build, you can see gcc and git dependencies in the source are the main reasons for image size differences.
Interesting! That's great that you were able to put those together.
I think that git will actually be required either way since CR uses that behind the scenes to clone repos, etc.
As for gcc, I wonder if that is needed by multiple dependencies in our requirements.txt. For example, if it was only needed for cx_Freeze then that would be perfect because we could remove it. ;) I'm interested in moving away from cx_Freeze in general just because it's a bit hard to maintain and can break in weird ways. (See our setup.py for some of the hackery we had to do to get it working.)
I'm interested in digging a bit deeper to see what we can cut to get the image as small as possible. Maybe some other alternatives would be to remove gcc after installing the requirements, or pre-build the requirements and archive those somewhere. I haven't done much with building lean images so this is new to me. :)
I'll chat with the rest of our team about this later this week, but feel free to submit a PR in the meantime with whatever you find most useful.
@josephharrington You almost nailed it! ;) gcc is needed by psutil in addition to cx_Freeze. I noticed only couple of hits for psutil and refactoring might be worth looking into. Regarding lean images, I should be able to give a hand later. Here's my plan:
- [1st Iteration] Send another pull request to Dockerize the binary i.e. for Dockerfile.bin
- [2nd Iteration] Close this pull request by adding Dockerfile.src
- [3rd Iteration] Do necessary optimization/refactors to converge above two into one i.e. Dockerfile.bin + Dockerfile.src = Dockerfile (build from source)
[1st Iteration] will allow me to be comfortable with the behaviors of CR on our test suites and will help me verify the build for [2nd Iteration] later. For [3rd Iteration], I need to deep dive into the codebase but definitely you guys can be ahead with this. I saw nice user guide docs but I am yet to come across any detailed technical docs/specs (covering system architecture, class diagrams, sequence diagrams, module organizations/dependencies, etc.). @josephharrington Any pointers? BTW, I have RSVPed for tomorrow's meetup at your HQ and looking forward to meeting you guys!
@yamaszone Your plan sounds great. I can give you some tips on the 3rd step you mentioned whenever we get there. Unfortunately we don't have any good tech docs yet; that's still something we need to put together. I do think we need a good tech overview for contributors though -- hopefully we can do something around that soon.