Volumes should not be defined in base images
Base images should avoid setting VOLUME since it is currently impossible to unset in child images: https://github.com/moby/moby/issues/3465
Setting PGDATA is a trivial way to adjust which directory PostgreSQL saves data to (which is also noted in the image description). See also https://github.com/docker-library/postgres/issues/375 for another discussion of this same topic.
Yes, but a pointless volume is still created.
@huggla is right. This is maybe ok if you use docker run and just have a few volumes. But if you use docker-compose or maybe even swarm than there are unaccounted volumes on your docker host, which are connected to the container and thus could not be removed. And even worse is that these containers are not named, they have a random id.
Just to show you. I run an application deployed with docker-compose.
$docker volume ls
DRIVER VOLUME NAME local f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f local portus_certificates local portus_postgres local portus_registry local portus_static
There are two volumes for the postgres container, as you can validate with docker inspect
$docker inspect --format="{{.Mounts}}" portus_db_1
[{volume portus_postgres /var/lib/docker/volumes/portus_postgres/_data /var/lib/postgres/data local rw true } {volume f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f /var/lib/docker/volumes/f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f/_data /var/lib/postgresql/data local true }]
I ran into the same problem and spent a few hours trying to understand why random volumes were being created in docker-compose even though I'd set one for /var/lib/postgresql/data myself. I think the docs should be clearer about this.
I can add another view why not to use the VOLUME:
We use automated tests with Postgres Image pre-filled with data during build time. This way the image starts a lot faster which saves computing time. Now imagine running these tests on every commit and pull request.
You make hundreds of empty volumes with that process. Currently we use our own Dockerfile, copy-pasted from official repo, only with the VOLUME line commented out.
This is also causing me issues on kubernetes, the behavior you use is forbidden in kubernetes for production, https://kubernetes.io/docs/concepts/storage/persistent-volumes/ :
HostPath (Single node testing only – local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)
I can add another view why not to use the VOLUME:
We use automated tests with Postgres Image pre-filled with data during build time. This way the image starts a lot faster which saves computing time. Now imagine running these tests on every commit and pull request.
You make hundreds of empty volumes with that process. Currently we use our own Dockerfile, copy-pasted from official repo, only with the VOLUME line commented out.
2018-12-05 23:46:19 (39.5 MB/s) - '/usr/local/bin/gosu.asc' saved [543/543]
- mktemp -d
- export GNUPGHOME=/tmp/tmp.Ii0f14Usol
- gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 gpg: keybox '/tmp/tmp.Ii0f14Usol/pubring.kbx' created gpg: keyserver receive failed: Cannot assign requested address
how did you get it to build i always run into the same issue?
@ta32
gpg: keyserver receive failed: Cannot assign requested address
https://github.com/inversepath/usbarmory-debian-base_image/issues/9
I'm not trying to argue for or against the VOLUME in the Dockerfile, but could someone explain the benefits of the VOLUME or intended use case? I'm just curious to learn best practices around Docker.
Without the volume call, if you are using it for testing purposes it will write data into the container and that data will be lost upon container deletion. But even with the volume, every time you create a container it just spawns a new anonymous volume, so you get the exact same behavior, but you leave volumes all over the place.
And the workaround is horrendous. Manually forking every version of postgres and changing one line.
It would be great if the project maintained current behavior and a no volume branch while deprecating over a few version the current behavior.
@workmaster2n I don't think this is best practice really, it is just quicker to get something working when you don't know what you are doing. Best practice is to know your tools reasonably well.
@workmaster2n see https://github.com/docker-library/official-images/pull/2437#issuecomment-266578827 for a decent summary of when we (the Official Images maintainers) recommend that image maintainers include a VOLUME (and when not to)
So this VOLUME is what's hiding the data/ directory in the bind mount that I put on /var/lib/postgresql/ :open_mouth:
Say /srv/data/postgresql/data/ contains a perfectly valid PostgreSQL database with gobs of data. Now,
docker run --rm -it \
-v /srv/data/postgresql:/var/lib/postgresql \
postgres psql -U postgres
and try to find a sliver of data. No such luck :cold_sweat:
I actually had my data in /srv/data/postgresql/11/ and used
docker run --rm -it \
-v /srv/data/postgresql:/var/lib/postgresql \
-e PGDATA=/srv/data/postgresql/11 \
postgres psql -U postgres
and that worked fine.
I figured I could drop setting PGDATA by moving 11/ to data/and was surprised I could no longer find any of the data. Using -v /srv/data/postgesql/data:/var/lib/postgresql/data fixes things though.
Anyway, I think I'll stick with using $PG_MAJOR/ style directories as that makes upgrading across major versions a bit easier (see #37).
You can still have $PG_MAJOR style directories on your host without having to set PGDATA.
docker run --rm -it \
-v /srv/data/postgresql/11/:/var/lib/postgresql/data/ \
postgres:11
Thanks for the suggestion.
I do like to have access to other places below /var/data/postgresql/ though, e.g. backups/, so I can scribble there instead of in the PGDATA directory. I guess I could achieve the same by adding another volume. Anyway, as usual, there is more than one solution and everyone gets to use whatever suits them :smile_cat: