Database connections are never closed
PgstacDB maintains a persistent connection to the database through self.connection = pool.getconn(). According to the getconn() documentation, this connection will not be automatically returned to the pool, and therefore won't be closed when idle. An explicit putconn() call is required. Currently, db_max_idle is ignored, which leads to database timeouts during long idle periods. I suggest not implementing a custom self.connection singleton pooling logic, but rather relying on the psycopg_pool implementation. Otherwise, you would need to implement the same feature set as psycopg_pool, including idle handling, which adds unnecessary complexity.
I could prepare a PR dropping the singleton caching and moving all caching logic to the psycopg_pool (as it should imo).
PgstacDB is set up with a disconnect method that is called both when using as a context manager (via the exit() method) as well as by trapping it using atexit.register(self.disconnect) to ensure that connections are automatically returned to the pool.
I'm confused about the purpose of using ConnectionPool and having the db_max_idle setting if they're not meant to be used. If users are expected to initialize PgstacDB via a context manager each time, creating a new pool just to get a single connection and then closing everything afterward, what's the benefit? The typical use case in SQLAlchemy/Redis/Valkey is to maintain the same connection pool throughout the program's lifecycle - that's where settings like db_max_idle make practical sense.
What you describe would also result in a constant memory leak due to repeated atexit.register calls that prevent the entire PgstacDB instance from ever being freed from memory.
All of these seem like unusual design decisions that cause more issues than they solve. I thought PostgreSQL connection pooling was a solved problem.
I've been reviewing the PgstacDB class, and here's what doesn't seem right to me: we return the connection to the pool https://github.com/stac-utils/pgstac/blob/d5901b7f7a6f441475191cf47b2810d959940085/src/pypgstac/src/pypgstac/db.py#L168 but just a few lines later, we set the pool to None https://github.com/stac-utils/pgstac/blob/d5901b7f7a6f441475191cf47b2810d959940085/src/pypgstac/src/pypgstac/db.py#L173
As a result, the pool is never reused across different context managers:
>>> pgdb = PgstacDB()
>>> with pgdb as ctx:
... pgdb.pool is None
...
False
>>> pgdb.pool is None
True
@Zaczero can you share how you are using PgstacDb
I can confirm the pool is not closed on exit and thus the connection is not closed as well
with PgstacDB(dsn="postgresql://username:[email protected]:5439/postgis") as pg:
pool = pg.get_pool()
conn = pg.connect()
print(pool.closed)
>>> False
print(conn.closed)
>>> False
# Manually closing the pool
pool.close()
print(pool.closed)
>>> True
print(conn.closed)
>>> True
I agree the design of the PgstacDB class could be re-worked a bit to make it simpler and reliable
cc @bitner