ruby-pg icon indicating copy to clipboard operation
ruby-pg copied to clipboard

async connection API does not properly handle timeouts with multiple hosts

Open ajmaidak opened this issue 1 year ago • 3 comments

If 10.0.0.1 does not respond via TCP even if 10.0.0.2 is online and functioning the following connection will timeout

require 'pg'
conn = PG.connect(
  host: '10.0.0.1,10.0.0.2',
  port: 5432,
  dbname: 'postgres',
  user: 'user',
  password: 'password',
  connect_timeout: 5,
  target_session_attrs: 'read-write',
)

result = conn.exec('SELECT version()')
puts result[0]['version']

The issue is that the async api supported by libpq ignores the "connect_timeout" connection parameter when using PQconnectStart/PQconnectPoll:

The connect_timeout connection parameter is ignored when using PQconnectPoll; it is the application's responsibility to decide whether an excessive amount of time has elapsed.

You can see discussion of this on the PostgreSQL hackers mailing list. Specifically around this same issue with psycopg3.

Browsing the code a bit I believe when the connect_timeout expires here rather then failing there needs to be a call to PQreset or PQresetStart/PQresetPoll to force libpq to reset the connection and try to access the next host?

ajmaidak avatar Dec 20 '24 20:12 ajmaidak

pg-1.5.x versions should do proper connection timeout handling. The details are described in this commit: https://github.com/ged/ruby-pg/commit/40b2ad5d8164eea4fb14d0275b6825aaf2d1a562

In your case it should wait 5 seconds per connection and abort after 10 seconds if no connection can be established.

Can you describe how it behaves in your setup? Can you do a bit debugging in the code path of pg that you referenced, to investigate how it behaves exactly?

larskanis avatar Dec 21 '24 07:12 larskanis

In your case it should wait 5 seconds per connection and abort after 10 seconds if no connection can be established.

Its not, here an example:

Run a postgres docker:

docker run -d \
    -p 5432:5432 \
    --name some-postgres \
    -e POSTGRES_PASSWORD=password \
    -e POSTGRES_HOST_AUTH_METHOD=scram-sha-256 \
    postgres

This test program

require 'pg'

puts "pg version is: #{PG::VERSION}"

begin
  conn = PG.connect(
    host: ENV["PGHOST"],
    port: 5432,
    dbname: 'postgres',
    user: 'postgres',
    password: 'password',
    connect_timeout: 5,
    target_session_attrs: 'read-write',
  )

  result = conn.exec('SELECT version()')
  puts result[0]['version']

ensure
  conn&.close
end

Set PGHOST (10.0.0.1 is just some address to which a connection will timeout):

$ export PGHOST="10.0.0.1,127.0.0.1"
$ time ruby test.rb 
pg version is: 1.5.9
/Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:699:in `async_connect_or_reset': connection to server at "10.0.0.1" (10.0.0.1), port 5432 failed: timeout expired (PG::ConnectionBad)
        from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:844:in `connect_to_hosts'
        from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg/connection.rb:772:in `new'
        from /Users/amaidak/.gem/ruby/3.2.4/gems/pg-1.5.9/lib/pg.rb:63:in `connect'
        from test.rb:6:in `<main>'

real    0m5.082s
user    0m0.046s
sys     0m0.026s

The expected behavior is the connection to 10.0.0.1 times out after 5 seconds but the client connection to 127.0.0.1 succeeds. Instead you can see the connection simply times out after 5 seconds.

Here is a PR with what I think the correct behavior should be: https://github.com/ged/ruby-pg/pull/619

ajmaidak avatar Dec 23 '24 21:12 ajmaidak

I'm not sure if this is the case with me, but I have a server that resolves to 3 ip addresses. My psql is version 13, and they are a hosted version 14. They do not allow ping but nc on that port 5432 works fine. I try to connect using Sequel and this error prevails:

PG::ConnectionBad: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

Just thought I'd add this. Not sure what's going on.

Update: I have Postico 2 working fine for the connection, as a GUI client.

daBee avatar Jan 23 '25 19:01 daBee