create docker container
There have been calls for the creation of a docker container that folks can use for The Workbench so that they can use something similar to make docker.
The docker container should be relatively straightforward to implement and we can implement two flavours:
- slim with just R, pandoc, and the necessary packages with room to expand for other packages. This would be used if people want to render without previews.
- fully featured with RStudio, R, and pandoc so that folks can work to preview their lessons.
Both of these can be based on containers from the Rocker Project or the base image for the R-Universe project where we can set the MY_UNIVERSE environment variable.
I've added a docker/ folder that contains a dockerfile and readme. It should be improved.
I am not a docker expert by any means, but the new WASM (https://docs.docker.com/desktop/wasm/) beta makes me think we might leverage this to speed up install/setup time, because as I understand it, the WASM shim is able to run without having to worry about architecture which is something I've encountered with docker in the past. Unfortunately using it drops compatibility with older docker containers. Thoughts?
This looks neat! Alas, I don't think we can leverage WASM yet because support for R in WASM is currently limited to https://github.com/georgestagg/webR#demo, and while it is possible to install and use R packages in the WASM, the process is extremely difficult at the moment (https://github.com/georgestagg/webR/issues/11). Moreover, there is no support for pandoc in WASM that we can use.
Ah, true. Well, hopefully as it comes out of beta, things can change. I'd be happy to help with implementing the container. From above, essentially make two containers, one that contains R, pandoc, etc. and another fully featured one with RStudio, yes?
Building the Container
Sort of. I think building the container is fairly straightforward (at least it will be far less complex than https://github.com/carpentries/lesson-docker because we will be adding packages on top of an existing system instead of trying to shoehorn the jekyll build system in to the RStudio container or vice-versa).
Right now the bare-bones Docker setup is https://github.com/carpentries/workbench/blob/d134ed1cb85a09b4a5e95d97597350a86503cec2/docker/Dockerfile, which uses the R-Universe container as a base (though preview is not possible).
I believe an equivalent one with RStudio included would be to replace r-universe/base with rocker/rstudio from the rocker project (or rocker/geospatial for the r-geospatial lessons): https://rocker-project.org/images/versioned/rstudio.html
User interaction with the container
The remaining issue of how to mount volumes, expose ports for previewing, and figure out how to get it to write to the system not as the root user is another matter that is likely solved with docker-compose, but that's currently beyond my technical capabilities (see https://rocker-project.org/images/versioned/rstudio.html#how-to-use for an overview of the array of possibilities)
One big caveat is how to handle the packages for lessons that use R-markdown. Since {renv} uses a global package cache with symbolic links, all of those symbolic links will be broken inside of the container and thus, while we mount the lesson itself inside of the container, we need to ignore the renv/profiles/lesson-requirements/renv/ (which I think is addressed by https://stackoverflow.com/questions/29181032/add-a-volume-to-docker-but-exclude-a-sub-folder, but it either requires a very long user command or a docker-compose file).
Hi! I am looking into it and would be happy to help :)
HI @harivyasi, thanks! I've had a busy past few weeks, but next week I'll likely sit down and start on the docker-compose setup. Let me know how you get along, and I'll be happy to add to anything you are working on.
I have a naive docker + docker-compose setup at https://github.com/alee/python-novice-gapminder/commit/0de50c735f4e32f9da953eb3cc278de6e96ff0b9 that should work for lessons that don't have the R-markdown issue raised earlier. Adding a secondary bind mount volume that supports a container-local {renv} should be pretty straightforward though by adding another bind mount for renv:./renv/profiles/lesson-requirements/renv/ or something similar into the docker-compose.yml file. What's a good r-markdown lesson to test that on though?
Not sure I understand why rstudio support would be needed, is it for the R-markdown lessons?
The image name in docker-compose.yml should also be templated for each specific lesson.
Following on from @alee I've got a working docker image and compose setup for a Workbench lesson. The image used as base is the rocker/rstudio:4.3.1 image. This image takes care of many issues related to permissions by mapping the user who started docker compose up to any mounts from the docker image itself.
A hook could be added to the Rprofile to call sandpaper::serve() after the project is opened if necessary.
A couple of issues
- The
./sitedirectory had to be deleted due to permissions issues on the first run. This might have been a conflict I had previously with a different docker approach. - Additional apt packages were needed to support the extended requirements of Sandpaper and its deps.
Note: in sandpaper 0.15.0, you can now use the SANDPAPER_SITE environment variable to move the site path to a different directory to avoid permissions conflicts (see https://carpentries.github.io/sandpaper/news/index.html#sandpaper-0150-2023-11-29)
We were not aware of this discussion taking place, but we built a docker image that is working and tested both on linux and mac. We went for a minimal R approach, to try and keep the size of the container as small as possible. The Docker file is available on the repo.
You can get the latest versions with:
For linux: docker pull ghcr.io/uomresearchit/sandpaper:latest
For mac: docker pull ghcr.io/uomresearchit/sandpaper:latest_arm
And can be run (from the lesson's base directory) with:
docker run -p 4321:4321 -v $PWD:/siteroot/ ghcr.io/uomresearchit/sandpaper:latest
Note: make sure you have the necessary directories before running the container:
mkdir -p instructors/{data,fig,files} learners/{data,fig,files} profiles/{data,fig,files}
Because it is on a bind mount, you can edit the lesson files and see the site updated a few seconds later (at http://localhost:4321/ ).
Thanks @fherreazcue - I just tried this out and it is working great! I'm using it with a minimalist docker-compose.yml e.g.,
services:
server:
image: ghcr.io/uomresearchit/sandpaper:latest # switch to `latest_arm` for macos
volumes:
- ./:/siteroot
ports:
- "127.0.0.1:4321:4321"
and it works like a charm and much smaller than the 5 GB R images that were being generated earlier :sweat_smile:
Dear all - I've been working on a centralised Carpentries-built set of versioned docker images for the Workbench and whilst they aren't complete or fully tested, we'd appreciate some feedback. We'll be sharing this with the maintainers group next week I believe, but some initial impressions would be useful! We build x86_64 and arm64 images, and they will be updated whenever sandpaper, varnish, or pegboard make releases.
Why did we start on a set of Carpentries docker images?
- to relieve the community of building, hosting (and potentially paying for), and running their own images and CI processes
- we'd like it to be easier to set up sandpaper across all operating systems
- we'd like to lower the barrier of entry for getting up and running with the Workbench
- we'd like to simplify maintenance and updates
- to try and fix problems with renv package management when using non-archive repositories such as r-universe
- One of the current issues with the sandpaper GitHub CI process is that it implicitly forces updates via {remotes} and
sandpaper::manage_deps(). This means pinning packages locally may work, but pushing changes to a lesson repository may result in dependency conflicts.
- One of the current issues with the sandpaper GitHub CI process is that it implicitly forces updates via {remotes} and
As such, the main focus of these images is to allow both local and CI-based builds to be possible, including storing renv artifacts to better support versioned environments.
In the meantime:
- the images are on DockerHub
- a README and instructions on running them are on the github repo
- examples of what the newer Dockerised CI GHA workflows will look like can be found on my fork of the R-ecology-lesson:
Things we're aware of:
- the current documentation is long and not designed for learners - this will improve over time and feedback!
- the documentation is also in one long blurb and not structured nicely in proper gh-pages
- copying lessons into named volumes is an extra step - it is, but relieves the permissions mess that can follow using bind mounts
- workbench-docker repo is a little untidy and could do with being made easier to understand
- the GitHub CI process is working via the workflow files I linked above, but not tied in to the current sandpaper GHA CI process
Please do let us know here or in Slack what you think, and of course all feedback is welcome! Thank you all in advance!