Connections flapping when different sessions use DBs on same host
We're using Mongoid 4.0.2, Moped 2.0.4, and Sidekiq 3.3.0. We made some changes to the pool size in our staging environment yesterday, and started getting emails from Compose (formerly MongoHQ) that we were flooding their instances with hundreds or thousands of authentications per minute. This baffled us because we'd been using larger pools in production for a long time and hadn't seen the same issue. Researching the problem led me to all the error reports from last year that were supposed to be fixed in Moped 2, so I was even more confused.
In both staging and production, we're declaring multiple sessions in our mongoid.yml because we have two different MongoDB databases to talk to. In production the two databases are on completely different hosts; in staging they were on the same instance and only the DB name was different:
production: # Values changed to protect the innocent
sessions:
default: &production_default
hosts:
- c0.foobar.m0.mongolayer.com:12345
- c0.foobar.m1.mongolayer.com:12345
username: foo
password: blahblah
database: foobar
options: &production_options
read: :secondary_preferred
pool_size: 25
timeout: 30
baz:
<<: *production_default
hosts:
- c1.foobar.m0.mongolayer.com:54321
- c1.foobar.m1.mongolayer.com:54321
database: foobar-baz # NOTE: hosts are different
staging:
sessions:
default: &staging_default
hosts:
- candidate.77.mongolayer.com:12345
- candidate.78.mongolayer.com:12345
username: stage-foo
password: blahblah
database: stage
options: &staging_options
read: :secondary_preferred
pool_size: 5
timeout: 30
baz:
<<: *staging_default
database: stage-baz # NOTE: hosts stay the same
Diving into the pooling behavior, I saw that in production, any queries on the second database would double the number of entries in the Moped::Connection::Manager pools hash. In staging, though, the number of pools stayed the same and only the DB name and credentials changed. It looks like this was because the pools hash is keyed by the hostname and port — so a connection to a different database on that same host-and-port will simply replace the entry in the hash.
This easily explains the "authentication flood" behavior we were seeing. Every time a query hit a different database than the one before, a new connection pool would replace the old one. It didn't happen in production because the pools pointed to different hosts and thus didn't replace each other.
We've found a workaround for this bug by simply moving our 'secondary' staging database to a different deployment in Compose. But being unable to stay connected to two sessions on the same host seemed like a moderately important problem, so I'm escalating it here. Please let me know if I can provide any additional information.