Add docker-compose build system
What this PR does / why we need it:
There have been a wide variety of community efforts involved around deploying Dataverse in containers such as Docker and Kubernetes. I am by no means a Docker expert but this separation of code into services for usage by docker-compose made sense in my mind. There are obviously many different ways to prepare it. I also wanted a way to be able to make changes and compile the Java code via the container itself rather than relying on prebuilt downloads to deploy. This does the following:
- Builds a copy of the
.wardeployable code from source - Stands up various services and pieces needed:
- seaweedfs - for s3 storage
- traefik - reverse proxy, HTTP is re-routed automatically to HTTPS
- postgres - database backend
- solr - text indexing database
- rserve - R server for running R commands
- dataverse - the main Dataverse web application
- sets up two storage options, one is the default
<id>=filesfor local storage and the other is<id>=s3for s3 storage
I wholeheartedly would appreciate suggestions and improvements from others who have much more experience than I do around Dataverse and container technologies. I do feel the lack of an "officially" supported container option in the dataverse repo is making it harder for new developers to jump on-board and contribute. I hope that we can come up with a solution that targets both local developer needs as well as people wanting to run Dataverse in development or production settings on a server. Thank you for your time.
Which issue(s) this PR closes:
Unknown if there are backlog issues related to this.
Special notes for your reviewer:
There are additional items that we may wish to discuss or consider to include as part of this PR:
- look at post configuration steps and hardening, is there anything else here we want to bring in?
- skip
install.pyif already installed, the dataverse container takes a long time to startup, is there a better method here? look at/conf/docker-compose/dataverse/startup.shand see if there are improvements to be made - persist log files from dataverse?
- test auth providers, I have no experience with these
- look into docker-compose secrets instead of environment variables
- add and test Windows build script like we have with
prepbuild.sh - add Github action for building, dependabot, etc.
- ~~move
docker-compose.ymloutside of this sub-directory to git repo root~~ (done) - check recorded user IP address, does the reverse proxy need to be adjusted?
- setup Grafana and System Metrics for seaweedfs: https://github.com/chrislusf/seaweedfs/wiki/System-Metrics
- change default storage provider to s3 from files? remove files as a provider?
Suggestions on how to test this:
Details on building are in /conf/docker-compose/README.md. All commands will be run from that directory.
Things I tested but on a different branch so we likely need additional testers here:
- Emails, I was able to hook it into our SMTP server
- The Site URL and FQDN settings
- S3 storage
- Persisting docroot files such as changing the root dataverse logo
- Persisting data from Solr and Postgres between restarts
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No UI changes, this is all backend build system.
Is there a release notes update needed for this change?:
Likely yes.
Additional documentation:
None
I tried with all your defaults and it seems to work except rserve, which fails with this:
rserve | Error in Rserve::run.Rserve(remote = TRUE, auth = TRUE, pwdfile = "/rserve.pwd", :
rserve | ignoring SIGPIPE signal
rserve | Execution halted
I also had to comment out the
COPY --chown=dataverse:dataverse ./.m2/ /home/dataverse/.m2/
line, but you already fixed that as I see. When building on an M1 Mac I also had to change JAVA_HOME to
ENV JAVA_HOME /usr/lib/jvm/java-1.11.0-openjdk-arm64
@beepsoft Thank you. I refactored the code a fair bit. I was able to build and run this on my Raspberry Pi to test both arm64 and amd64 architectures. Are you able to try again to see if this resolves the issues you faced?
I also move the docker-compose.yml and the Dataverse specific Dockerfile to the root of the repo. This means there is only one step for the prepbuild.sh to do and makes it much easier for developers to make changes in the real codebase and test stuff as compared to making an entire folder copy of all of dataverse.
On M1 arm it now fails for me with this:
Progress (3): 0.8/2.2 MB | 0.4/1.2 MB | 69/250 kB
#25 674.9 [output clipped, log limit 1MiB reached]
#25 ERROR: executor failed running [/bin/sh -c cd /dataverse/ && export dpkgArch="$(dpkg --print-architecture)" && export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" && mvn package -DskipTests]: exit code: 1
------
> [21/25] RUN cd /dataverse/ && export dpkgArch="$(dpkg --print-architecture)" && export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" && mvn package -DskipTests:
------
executor failed running [/bin/sh -c cd /dataverse/ && export dpkgArch="$(dpkg --print-architecture)" && export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-${dpkgArch}" && mvn package -DskipTests]: exit code: 1
ERROR: Service 'dataverse' failed to build : Build failed
I checked that JAVA_HOME is calculated correctly.
I also tried running mvn package -DskipTests locally and builds all right.
I also tried running just mvn dependency:resolve instead of mvn package -DskipTests in the Dockerfile and this fails as well.
Hmm interesting, thanks. I'll see if I can get a colleague to try it on his M1.
I believe I've fixed the Maven build system. If others could test this again that would be great. Thanks
I can confirm it builds and runs all right on M1 now.
Hi @carlsonp, for this next sprint we are catching up on Community PRs. Would you mind updating/ refreshing thiis PR from develop? Thanks!
I've rebased onto develop.
Just to let everybody reading in the future about this pull request know: @carlsonp @pdurbin and me talked today about this and related work like #8832, #8834 or #8320.
We're all good, we don't want to block each other, we seem to agree about long term goals (which is what I am up for) and we want to coordinate and iterate.
After all, all the work done here and elsewhere isn't as far away from each other as one might assume. And we are all on the same page these kind of things must go upstream to be F.A.I.R. (haha!) and easy to use.
TODO for me: split out counter-processor library into it's own service
Have a /makedatacount dir that is a Docker Volume or K8s volume to share those logs between Dataverse and the COUNTER processor
TODO for me: add Trivy container scanning
Coverage: 20.013% (+0.02%) from 19.997% when pulling 134695d3ba2262ac93f66b02bc5a33875a63df31 on carlsonp:docker-compose into f63f0e85bfb8f2e8526551b260744f17f2d99915 on IQSS:develop.
Started to work using the existing base container as the base for the dataverse build. Moved the other containers into the similar maven build system in the modules folder. Stuck on getting dataverse to launch, will need help with Payara pre and post boot scripts.
I checked with ❤️@carlsonp❤️ and he's fine with us closing this in favor of this PR:
- #9439
Here's the quickstart for devs from the PR, by the way: https://dataverse-guide--9439.org.readthedocs.build/en/9439/container/dev-usage.html
@carlsonp thanks for helping with the containerization effort! 🎉 🚀
@carlsonp I just wanted to let you know that ideas in this PR are still being discussed: https://github.com/IQSS/dataverse/pull/9439#discussion_r1138329433 ❤️