Password of pgautofailover_monitor? Why it is always setup as trust in pg_hba?
We are trying to use pg_auto_failover with Docker swarm. In Docker swarm the DNS reverse lookup of the container IP does not match its hostname, e.g. db-monitor -> 192.168.1.10 -> db-monitor.<slot-id>.<task-id>.<network>, so pg_auto_failover issues some warnings and then uses the container IP for pg_hba.conf.
However given that container IPs change regularly and that --pg-hba-lan does not affect replication entries in pg_hba.conf we decided to use --skip-pg-hba. But now we can not use --auth scram-sha-256 anymore and have to include our own postgres-custom.conf which configures postgres to use scram-sha-256 again. Why is that? Why can't we use --skip-pg-hba and --auth scram-sha-256 at the same time?
This generally works, but now on the postgres nodes we see a user pgautofailover_monitor which still has a MD5 password while the pgautofailover_replicator user has a scram password (because we need to set that password manually after postgres node creation). So technically that user should not work, however we have copied the pg_hba.conf generated by pg_auto_failover prior using --skip-pg-hba and it contains an entry hostssl all "pgautofailover_monitor" x.x.x.x/24 trust. So the MD5 password isn't used anyways.
We don't like the fact that we have a trust based user for a whole subnet in our pg_hba.conf so we would like to change it to scram-sha-256 just like all the other entries in that file. This brings us to the following questions:
- Why does this user have a MD5 password, even though our postgres-custom.conf file defines scram-sha-256?
- What is the password of that user so that we can update it to scram-sha-256?
- Shouldn't the password be configurable to begin with and pg_auto_failover should not use
trustfor that user inpg_hba.conf
Hi @jnehlmeier ; thanks for opening this issue. The situation is a little complex with the HBA and the roles we use, and I believe we have a good opportunity to improve our docs on that topic.
The pgautofailover_monitor user is created with the following hard-coded properties. The properties (including the password and the HBA authentication method) are hard-coded for this user because we never actually authenticate with this role. What the monitor does is more like a ping, it does not send the password on the wire. Also it is expected that the user has no privilege in the database.
#define PG_AUTOCTL_HEALTH_USERNAME "pgautofailover_monitor"
#define PG_AUTOCTL_HEALTH_PASSWORD "pgautofailover_monitor"
At the moment if you want to control that you can use standard SQL tooling to ALTER the user once it's been created, and you may replace our hard-coded HBA rule with another one ; just make sure that the monitor can indeed connect to the Postgres nodes.
Hi @DimCitus , thanks for your answer.
I have now changed
hostssl all "pgautofailover_monitor" x.x.x.x/24 trust
to
hostssl all "pgautofailover_monitor" x.x.x.x/24 scram-sha-256
and updated the password on the postgres nodes to something different. Everything still seems to work.
So if the password does not matter to you, why don't you set a random password instead of a hardcoded one? Seems like nobody needs to know the password.
And why do you forbid --auth scram-she-256 during pg_autoctl create postgres --skip-pg-hba? Currently this results in an MD5 password being generated and forces developer to create own configuration file with password_encryption = scram-sha-256. Then developer has to update the password again so that pg_shadow is consistent and does not contain MD5 and scram-sha-256 passwords.
The property of the password encryption (either md5 or scram at this point) is decided by the password_encryption GUC, see https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION. At the moment we don't change that GUC from its default when creating the user with the hard-coded password.
We could change the GUC when the --auth has been given as scram, indeed. Will look into it, unless you want to have a try? it should be a pretty well localized code change, I suppose.
@DimCitus Could it be as easy as removing this check
https://github.com/citusdata/pg_auto_failover/blob/0fecd653f9ef43019e4d76e347afce378adcc481/src/bin/pg_autoctl/cli_create_node.c#L428-L432
for both --auth and --skip-pg-hba and removing
https://github.com/citusdata/pg_auto_failover/blob/0fecd653f9ef43019e4d76e347afce378adcc481/src/bin/pg_autoctl/cli_create_node.c#L447-L450
from --skip-pg-hba?
I think I would start with the following place in the code, where we create the pgautofailover_monitor user with hard-coded password and HBA rules, and then navigate up from there.
https://github.com/citusdata/pg_auto_failover/blob/0fecd653f9ef43019e4d76e347afce378adcc481/src/bin/pg_autoctl/fsm_transition.c#L262-L286
Thinking about your comment above, I think it would make sense that we could accept both --auth and --skip-pg-hba. Another approach would be to accept a new parameter --password-encryption and set the Postgres GUC with the given value in the session where we do the create user ; which we are going to need anyway. Either we match the --auth parameter to scram and set the GUC then, or we have an explicit parameter for it.
There is yet another option though. We should probably have a look at creating the user without a password at all. create role takes password null alright, and then “password authentication will always fail for that user”. I can't remember now if we tried that already in pg_auto_failover and what's the result is going to be for the monitor's health checks. Can you have a try, and see the logs of the Postgres nodes to check for unwanted authentication errors in there?
There is yet another option though. We should probably have a look at creating the user without a password at all. create role takes
password nullalright, and then “password authentication will always fail for that user”. I can't remember now if we tried that already in pg_auto_failover and what's the result is going to be for the monitor's health checks. Can you have a try, and see the logs of the Postgres nodes to check for unwanted authentication errors in there?
Setting the password on the postgres nodes to null seems to work as well. Monitor shows correct state with nodes being reachable and switchover, etc. can be performed without issue.