pgpool Add transaction support

Dec 03 '15 09:12 tnt-dev

Thank you Pavel for this req. Can you please also add how to use this in the README file, and squash the commit? I'll pull this in once done. Thanks!

Dec 03 '15 10:12 ostinelli

README updated, sorry for delay.

Jan 02 '16 17:01 tnt-dev

Thank you for your input. However, if this is the intended usage then I'm afraid it won't work.

The issue is that when you call a transaction, inside of its fun you're calling pgpool:squery/2 again:

pgpool:transaction(db1_name,
  fun(C) ->
    pgpool:squery(C, "DELETE FROM users WHERE foo = true"),
    pgpool:squery(C, "UPDATE users SET bar = 100"),
    pgpool:squery(C, "SELECT * FROM users")
  end
).

Fact is, pgpool:squery/2 checks out a worker:

squery(DatabaseName, Sql) ->
    poolboy:transaction(DatabaseName, fun(Worker) ->
        gen_server:call(Worker, {squery, Sql}, infinity)
    end).

Therefore, every time you run a query inside of the transaction fun, it might actually use a different worker, hence a different connection. To my understanding, you can't have a transaction run on different connections.

We could, though, do something like:

pgpool:transaction(db1_name,
  fun() ->
    pgpool:squery("DELETE FROM users WHERE foo = true"),
    pgpool:squery("UPDATE users SET bar = 100"),
    pgpool:squery("SELECT * FROM users")
  end
).

And then ensure that pgpool:squery/1 always uses the same connection.

Jan 07 '16 19:01 ostinelli

I think there should be a distinct difference between sharing a connection between multiple statements and a true transaction (BEGIN...COMMIT). Poolboy calls their checkout/checkin a transaction, but it's a bit of a misnomer when you think about a database transaction. Since this library uses poolboy for a distinct reason to access Postgres, I think there should be a proper difference, something like with_connection to represent running statements with a shared connection and with_transaction, which not only shares a connection, but actually runs inside a database transaction. The epgpsql driver provides the with_transaction/2 function to do the db transaction.

May 08 '16 20:05 aberman

Yes @aberman, that is exactly my point for not merging this PR in.

BTW, now PGPool has the batch/2 function which can easily be extended to run inside of a database transaction.

Would you like to see this feature in PGPool?

May 08 '16 20:05 ostinelli

@ostinelli, I think the use of a db transaction should be left up to the user of the library. My use of batch statements is usually when I'm doing some offline process and I might have thousands of inserts/updates. 95% of the time I just want to skip the bad insert/update and log it, not have the entire transaction fail.

I think pgpool should provide the tools just like the driver, but shouldn't make all the decisions for the user. To me, the purpose of the library is to tack on some useful features to the driver, which means it's the interface I use to access the db, instead of the driver. That, to me, means it should replicate everything or most of what the driver does as well. That might be outside the scope of what you wanted to do, but I do foresee that the more people that use this, the more requests you're going to get to add that sort of flexibility.

May 08 '16 20:05 aberman