async connection API does not properly handle timeouts with multiple hosts
If 10.0.0.1 does not respond via TCP even if 10.0.0.2 is online and functioning the following connection will timeout
require 'pg'
conn = PG.connect(
host: '10.0.0.1,10.0.0.2',
port: 5432,
dbname: 'postgres',
user: 'user',
password: 'password',
connect_timeout: 5,
target_session_attrs: 'read-write',
)
result = conn.exec('SELECT version()')
puts result[0]['version']
The issue is that the async api supported by libpq ignores the "connect_timeout" connection parameter when using PQconnectStart/PQconnectPoll:
You can see discussion of this on the PostgreSQL hackers mailing list. Specifically around this same issue with psycopg3.
Browsing the code a bit I believe when the connect_timeout expires here rather then failing there needs to be a call to PQreset or PQresetStart/PQresetPoll to force libpq to reset the connection and try to access the next host?
pg-1.5.x versions should do proper connection timeout handling. The details are described in this commit: https://github.com/ged/ruby-pg/commit/40b2ad5d8164eea4fb14d0275b6825aaf2d1a562
In your case it should wait 5 seconds per connection and abort after 10 seconds if no connection can be established.
Can you describe how it behaves in your setup? Can you do a bit debugging in the code path of pg that you referenced, to investigate how it behaves exactly?
In your case it should wait 5 seconds per connection and abort after 10 seconds if no connection can be established.
Its not, here an example:
Run a postgres docker:
docker run -d \
-p 5432:5432 \
--name some-postgres \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_HOST_AUTH_METHOD=scram-sha-256 \
postgres
This test program
require 'pg'
puts "pg version is: #{PG::VERSION}"
begin
conn = PG.connect(
host: ENV["PGHOST"],
port: 5432,
dbname: 'postgres',
user: 'postgres',
password: 'password',
connect_timeout: 5,
target_session_attrs: 'read-write',
)
result = conn.exec('SELECT version()')
puts result[0]['version']
ensure
conn&.close
end
Set PGHOST (10.0.0.1 is just some address to which a connection will timeout):
$ export PGHOST="10.0.0.1,127.0.0.1"
$ time ruby test.rb
pg version is: 1.5.9
/Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:699:in `async_connect_or_reset': connection to server at "10.0.0.1" (10.0.0.1), port 5432 failed: timeout expired (PG::ConnectionBad)
from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:844:in `connect_to_hosts'
from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:772:in `new'
from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg.rb:63:in `connect'
from test.rb:6:in `<main>'
real 0m5.082s
user 0m0.046s
sys 0m0.026s
The expected behavior is the connection to 10.0.0.1 times out after 5 seconds but the client connection to 127.0.0.1 succeeds. Instead you can see the connection simply times out after 5 seconds.
Here is a PR with what I think the correct behavior should be: https://github.com/ged/ruby-pg/pull/619
I'm not sure if this is the case with me, but I have a server that resolves to 3 ip addresses. My psql is version 13, and they are a hosted version 14. They do not allow ping but nc on that port 5432 works fine. I try to connect using Sequel and this error prevails:
PG::ConnectionBad: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Just thought I'd add this. Not sure what's going on.
Update: I have Postico 2 working fine for the connection, as a GUI client.