Should failing entrypoint scripts clean up data directory?
For instance, I have some scripts that set up some users and such using ENV variables passed to docker run. I want to make sure that if those variables were not passed to the docker run then the initialization process halts. Currently, the data directory is created and can be left in some undetermined state if one of the entrypoint scripts fails.
Should it perhaps create the data directory in a temporary location first, and when all scripts complete successfully - move it to the $PG_DATA location and clean everything up if those scripts fail?
Whew, sorry for the very, very long tail on this. This is definitely something I've thought about a lot and ideally, yes, if the database is not (to our knowledge) "successfully initialized" we'd ideally have a still-empty data directory instead of a potentially-half-initialized one. This has absolutely bitten me directly several times, and I've seen countless examples of other folks being bitten by it.
On the other hand, I've also seen folks quietly relying on the current behavior (:weary: yes, I know :disappointed:), so I'm not sure what the best answer here is. This isn't necessarily an obvious "security" win that warrants a large breaking change like we've done previously, so perhaps it would make sense to make it either opt-in or opt-in on old versions and opt-out on new major releases?
There's also some details on the implementation that I think could get complicated. I don't think we can initialize to a truly "temporary" directory in the traditional sense safely because both the size of the initialized data and the cross-device issues might be prohibitive. I think the best we could probably do is a temporary subdirectory of the provided $PGDATA? I really like your idea much better than any I've had though (things like having a trap that goes and blindly empties $PGDATA if things fail :sweat_smile: :innocent:).
Frankly we could probably do something like $PGDATA/.docker-initdb and then we could even bail early if that directory exists because it would pretty much always mean a failed initialization (which avoids issues with the initialization occurring multiple times but still leads to continuously failing containers).
It's not complete enough to be a full PR yet, but I did play around with this a little bit and my preliminary testing was pretty successful:
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 09a7564..2cd765b 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -60,6 +60,38 @@ docker_create_db_directories() {
fi
}
+# TODO make these opt-in / opt-out (https://github.com/docker-library/postgres/issues/159#issuecomment-1042466970)
+docker_temp_create_initdb_location() {
+ [ -z "${dockerTempOrigPGDATA:-}" ]
+ dockerTempOrigPGDATA="$PGDATA"
+ export PGDATA="$PGDATA/.docker-initdb"
+ if [ -s "$PGDATA/PG_VERSION" ]; then
+ {
+ echo
+ echo "Error: '$PGDATA' exists and appears to contain a database!"
+ echo ' this usually means a previous initialization attempt failed'
+ echo ' it is recommended to check logs for previous container runs'
+ echo
+ } >&2
+ exit 1
+ fi
+ echo "Using '$PGDATA' as a temporary PGDATA for initialization ..."
+}
+docker_temp_migrate_initdb_location() {
+ if [ -n "${dockerTempOrigPGDATA:-}" ]; then
+ # TODO balk if "$dockerTempOrigPGDATA/PG_VERSION" or if *not* "$PGDATA/PG_VERSION" !!! (if we have two initialized databases or don't have one at all)
+ echo "Migrating '$PGDATA' back to '$dockerTempOrigPGDATA' ..."
+ # TODO use find -mindepth -maxdepth instead of dotglob
+ (
+ shopt -s dotglob
+ mv -ft "$dockerTempOrigPGDATA" "$PGDATA"/*
+ )
+ rmdir "$PGDATA"
+ export PGDATA="$dockerTempOrigPGDATA"
+ unset ORIG_PGDATA
+ fi
+}
+
# initialize empty PGDATA directory with new database via 'initdb'
# arguments to `initdb` can be passed via POSTGRES_INITDB_ARGS or as arguments to this function
# `initdb` automatically creates the "postgres", "template0", and "template1" dbnames
@@ -314,6 +346,8 @@ _main() {
# check dir permissions to reduce likelihood of half-initialized database
ls /docker-entrypoint-initdb.d/ > /dev/null
+ docker_temp_create_initdb_location
+
docker_init_database_dir
pg_setup_hba_conf "$@"
@@ -328,6 +362,8 @@ _main() {
docker_temp_server_stop
unset PGPASSWORD
+ docker_temp_migrate_initdb_location
+
echo
echo 'PostgreSQL init process complete; ready for start up.'
echo
Thinking about this initialization in a temporary directory and I think the set of users we could break are ones using the full path (/var/lib/postgresql/data/) when running sed on pg_hba.conf instead of using "$PGDATA/pg_hba.conf" in /docker-entrypoint-initdb.d scripts. They should be using PGDATA, so I think that would be a fair break. Are there other items that would be installed into PGDATA and not work if it moves, like maybe some extensions?
To err on the side of most caution, maybe all current versions (10-14) could have an opt-out variable that is on by default and the next major release (15) would use the behavior by default by removing the automatic opt-out.
+1 on this, it would be very useful to prevent the daemon from starting if any initdb.d script encounters an error.
Above patch by @tianon does not cover the case that an initialization script wants to access the cluster (in our case pgbackrest stanza-create) at the intended (non-temporary) location. We used the following patch instead. If you guys think that's the right direction, I might as well open a pull request.
@@ -308,8 +308,19 @@ _main() {
exec gosu postgres "$BASH_SOURCE" "$@"
fi
+ local initFailed="$PGDATA/INITFAILED"
+
+ if [ -s "$initFailed" ]; then
+ cat >&2 <<-'EOE'
+ Error: Previous database initialization failed.
+ EOE
+ exit 1
+ fi
+
# only run initialization on an empty data directory
if [ -z "$DATABASE_ALREADY_EXISTS" ]; then
+ trap "mkdir "$initFailed"" EXIT
+
docker_verify_minimum_env
# check dir permissions to reduce likelihood of half-initialized database
@@ -329,6 +340,8 @@ _main() {
docker_temp_server_stop
unset PGPASSWORD
+ trap - EXIT
+
cat <<-'EOM'
PostgreSQL init process complete; ready for start up.
I'm not sure I understand what you mean -- my patch still starts the temporary server? :sweat_smile:
@tianon True. Let me clarify: An initialization script might want to access the cluster at the intended (non-temporary) location. Of course, scripts could be patched to find the cluster at the temporary location. I would argue, however, that a less intrusive solution is preferable. Another advantage of not using a temporary location is that in case the cluster is supposed to reside in a docker volume (potentially physically distinct from the volume holding the container), we don't unexpectedly use a potentially hight amount of disk space during initialization and don't need to move the initialized cluster across volumes. In both cases workarounds exist but they require potentially tedious adjustments by the user of this image.
Urg, data_directory (https://github.com/docker-library/postgres/issues/1220) makes this more complicated too. :upside_down_face: