Connections and transactions meta issue
Hi all, we've been experiencing some issues with Databases regarding connection and transaction management, so I've spent some time reading through the different issues and PRs and understanding the code. There are couple cross cutting problems that I think need discussion as to how to best tackle them. I'll do my best to outline the past and pressent issues and their causes.
- Databases makes use of ContextVars to store the Connection objects across tasks. When a query method (execute, fetch, etc) is awaited the
connectionmethod is called which checks for the presence of a connection in the context, if there is one it is returned, otherwise it creates a new connection and sets it in the context (https://github.com/encode/databases/blob/master/databases/core.py#L190-L202). - However, as outlined in https://github.com/encode/databases/issues/230, once a connection is created all coroutines/tasks inherit the context from the parent, which effectively means the same connection will be returned. It's unclear to me if this was an intentional decision or an oversight, but it's the cause of issues regarding transactions opened in concurrent tasks, reported in https://github.com/encode/databases/issues/340 and https://github.com/encode/databases/issues/327.
- A PR was opened and merged addressing this issue which essentially moves away from the previous behaviour of one connection being shared to each time
transactionis called a new connection is created. This change was part of the 0.5.0 release. However, this change led to issues where transactions are not rolled back correctly (https://github.com/encode/databases/issues/403, https://github.com/encode/databases/issues/424, https://github.com/encode/databases/issues/425), although it's not entirely clear to me why (any clarification on this respect would be appreciated).
In summary, there's two valid use cases, concurrent transaction management and predictable non-concurrent transaction rollback, that seem at odds with each other in the current implementation. Ideally we'd find a solution for https://github.com/encode/databases/issues/327 that does not require creating a new connection per transaction.
I also think it might be worth clarifying the connection management model, the docs suggest the connections are handled transparently but it's unclear when actual backend connections are created, from the code it seems they're only created once on Connection.__aenter__ and since the first created connection is set in the context and returned at all times, there's effectively a single backend connection acquired at any point in time.
Hopefully this summary sparks a conversation around possible fixes for these problems, please do let me know if I misunderstood anything.
I spent some more time debugging the current code and I believe I found the issue with the transactions introduced in #328.
As explained earlier it only happens when there's been a previous query to the database, which calls Database.connection creating a connection and setting it in the context. Note that the context was created on init and stored in the Database as an instance variable.
When Database.transaction is called we get the connection from the context and check for previous transactions, since there is none we copy the context and pass a callable to Database.Transaction, however, and this is crucial, the context instance variable is not replaced.
Transaction.__aenter__ is called which starts the backend transaction with the new connection. Once inside the context manager a query method is called on Database, again calling Database.connection and inspecting the original context which returns the previous connection NOT wrapped in the transaction so the changes are persisted.
Finally, on Transaction.__aexit__ the transaction is indeed rolled back, but there's been no query executed on that connection nothing happens.
The first instinct is to replace the context in the Database instance with the new one, however, that would mean the previous connection would be unreachable since there's only one context at any point in time. I believe this was a deliberate design choice, i.e. there should be a single Database instance per task and the same connection is reused at all times.
Personally, I think the best course of action is to revert #328 since it caused a regression and tackle #327 starting from the API design, defining the semantics of what the user expects when opening transactions in concurrent tasks.
@yeraydiazdiaz could one use a stack to keep track of connections?
@yeraydiazdiaz I'm stuck with this problem too, here is my test #425
Is there any status update on this?
@emerson-h Looks like it was fixed in 0.6.0 🎉
The problem still exists
@matthiasBT can you please clarify the behavior that still exists? Perhaps by including a code snippet?
I imagine this issue is affected by the changes to transaction and connection context in #546 which has much discussion. If this issue is still affecting you on the latest version, the next step is to provide a failing test case that illustrates the problem.