postgres icon indicating copy to clipboard operation
postgres copied to clipboard

Volumes should not be defined in base images

Open huggla opened this issue 7 years ago • 16 comments

Base images should avoid setting VOLUME since it is currently impossible to unset in child images: https://github.com/moby/moby/issues/3465

huggla avatar Jan 29 '18 08:01 huggla

Setting PGDATA is a trivial way to adjust which directory PostgreSQL saves data to (which is also noted in the image description). See also https://github.com/docker-library/postgres/issues/375 for another discussion of this same topic.

tianon avatar Feb 13 '18 17:02 tianon

Yes, but a pointless volume is still created.

huggla avatar Feb 13 '18 18:02 huggla

@huggla is right. This is maybe ok if you use docker run and just have a few volumes. But if you use docker-compose or maybe even swarm than there are unaccounted volumes on your docker host, which are connected to the container and thus could not be removed. And even worse is that these containers are not named, they have a random id.

Just to show you. I run an application deployed with docker-compose.

$docker volume ls

DRIVER VOLUME NAME local f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f local portus_certificates local portus_postgres local portus_registry local portus_static

There are two volumes for the postgres container, as you can validate with docker inspect

$docker inspect --format="{{.Mounts}}" portus_db_1 [{volume portus_postgres /var/lib/docker/volumes/portus_postgres/_data /var/lib/postgres/data local rw true } {volume f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f /var/lib/docker/volumes/f91eefad9a2e564e27d6fd204e94990b39206d641cb0bfaca1cb3dd36cee2b9f/_data /var/lib/postgresql/data local true }]

jannemann avatar Feb 14 '18 08:02 jannemann

I ran into the same problem and spent a few hours trying to understand why random volumes were being created in docker-compose even though I'd set one for /var/lib/postgresql/data myself. I think the docs should be clearer about this.

cantino avatar Apr 28 '18 19:04 cantino

I can add another view why not to use the VOLUME:

We use automated tests with Postgres Image pre-filled with data during build time. This way the image starts a lot faster which saves computing time. Now imagine running these tests on every commit and pull request.

You make hundreds of empty volumes with that process. Currently we use our own Dockerfile, copy-pasted from official repo, only with the VOLUME line commented out.

hKaspy avatar Oct 05 '18 10:10 hKaspy

This is also causing me issues on kubernetes, the behavior you use is forbidden in kubernetes for production, https://kubernetes.io/docs/concepts/storage/persistent-volumes/ :

HostPath (Single node testing only – local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)

ms4720 avatar Oct 26 '18 03:10 ms4720

I can add another view why not to use the VOLUME:

We use automated tests with Postgres Image pre-filled with data during build time. This way the image starts a lot faster which saves computing time. Now imagine running these tests on every commit and pull request.

You make hundreds of empty volumes with that process. Currently we use our own Dockerfile, copy-pasted from official repo, only with the VOLUME line commented out.

2018-12-05 23:46:19 (39.5 MB/s) - '/usr/local/bin/gosu.asc' saved [543/543]

  • mktemp -d
  • export GNUPGHOME=/tmp/tmp.Ii0f14Usol
  • gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 gpg: keybox '/tmp/tmp.Ii0f14Usol/pubring.kbx' created gpg: keyserver receive failed: Cannot assign requested address

how did you get it to build i always run into the same issue?

ta32 avatar Dec 05 '18 23:12 ta32

@ta32

gpg: keyserver receive failed: Cannot assign requested address

https://github.com/inversepath/usbarmory-debian-base_image/issues/9

ms4720 avatar Dec 06 '18 02:12 ms4720

I'm not trying to argue for or against the VOLUME in the Dockerfile, but could someone explain the benefits of the VOLUME or intended use case? I'm just curious to learn best practices around Docker.

workmaster2n avatar Dec 11 '18 05:12 workmaster2n

Without the volume call, if you are using it for testing purposes it will write data into the container and that data will be lost upon container deletion. But even with the volume, every time you create a container it just spawns a new anonymous volume, so you get the exact same behavior, but you leave volumes all over the place.

And the workaround is horrendous. Manually forking every version of postgres and changing one line.

mindreader avatar Jan 07 '19 18:01 mindreader

It would be great if the project maintained current behavior and a no volume branch while deprecating over a few version the current behavior.

ms4720 avatar Jan 08 '19 02:01 ms4720

@workmaster2n I don't think this is best practice really, it is just quicker to get something working when you don't know what you are doing. Best practice is to know your tools reasonably well.

ms4720 avatar Jan 08 '19 02:01 ms4720

@workmaster2n see https://github.com/docker-library/official-images/pull/2437#issuecomment-266578827 for a decent summary of when we (the Official Images maintainers) recommend that image maintainers include a VOLUME (and when not to)

tianon avatar Jan 25 '19 22:01 tianon

So this VOLUME is what's hiding the data/ directory in the bind mount that I put on /var/lib/postgresql/ :open_mouth:

Say /srv/data/postgresql/data/ contains a perfectly valid PostgreSQL database with gobs of data. Now,

docker run --rm -it \
    -v /srv/data/postgresql:/var/lib/postgresql \
    postgres psql -U postgres

and try to find a sliver of data. No such luck :cold_sweat:

I actually had my data in /srv/data/postgresql/11/ and used

docker run --rm -it \
    -v /srv/data/postgresql:/var/lib/postgresql \
    -e PGDATA=/srv/data/postgresql/11 \
    postgres psql -U postgres

and that worked fine.
I figured I could drop setting PGDATA by moving 11/ to data/and was surprised I could no longer find any of the data. Using -v /srv/data/postgesql/data:/var/lib/postgresql/data fixes things though.

Anyway, I think I'll stick with using $PG_MAJOR/ style directories as that makes upgrading across major versions a bit easier (see #37).

paddy-hack avatar Feb 05 '21 03:02 paddy-hack

You can still have $PG_MAJOR style directories on your host without having to set PGDATA.

docker run --rm -it \
    -v /srv/data/postgresql/11/:/var/lib/postgresql/data/ \
    postgres:11

yosifkit avatar Feb 05 '21 17:02 yosifkit

Thanks for the suggestion.
I do like to have access to other places below /var/data/postgresql/ though, e.g. backups/, so I can scribble there instead of in the PGDATA directory. I guess I could achieve the same by adding another volume. Anyway, as usual, there is more than one solution and everyone gets to use whatever suits them :smile_cat:

paddy-hack avatar Feb 06 '21 06:02 paddy-hack