Question: why are sqlite transactions in asyncio tasks so much faster?
Is your feature request related to a problem? Please describe. Hi, thanks for tortoise! I've had a lot of fun using it so far.
There is a known issue with the postgres backend where transactions cannot be used in tasks, because tasks copy the context and therefore create stale connection objects (https://github.com/tortoise/tortoise-orm/issues/743).
That is, you cannot do:
async def my_save(item):
async with in_transaction() as conn:
Model(item).save(using_db=conn)
item = ...
asyncio.ensure_future(my_save(item)) # <- this creates a future
# which copies context and will raise InterfaceError: cannot call Connection.fetch()
However! You can do the same with sqlite, supposedly because its in-process design has no concept of connections.
I've tested a simple sqlite setup with two ways of concurrent inserts (single rows, no batch):
async def insert1(item):
asyncio.ensure_future(my_save(item))
async def insert2(item):
await my_save(item)
The first version (insert1) is a lot faster (I don't have a reproducible benchmark script, but it seems it's an order of magnitude faster). Note that there are no collisions or exceptions.
Describe the solution you'd like
- I'd like to understand why there even is a difference. Both coroutines should essentially do the same, which is to allow other concurrent transactions to execute.
- Ideally, I'd like to be able to use tasks for postgres as well (open issue here https://github.com/tortoise/tortoise-orm/issues/743).
- I'd like to understand whether I'm breaking any consistency guarantees of sqlite by using tasks?
Describe alternatives you've considered Well, it's a performance-related question, not strictly a bug. So using the current version of tortoise is fine, it's just perhaps slower than it could be.
Additional context I'm using tortoise in an event-driven system that should scale up to millions of coroutines and tens of thousands of simultaneous transactions. Ideally, all of those are completely independent, meaning I'd benefit from a massively concurrent codepath down to the DB. I'd also like to use signals, that's why bulk inserts are out of the question.
I'm unable to reproduce this, my ensure_future latency is actually slightly higher than with await.
My results:
Getting average read/write latency for 1000 iterations...
---------------------------------------------------------
ensure_future average: 0.705ms (read) | 0.343ms (write)
await average: 0.690ms (read) | 0.344ms (write)
---------------------------------------------------------
Finished benchmarks, took 6.263s
Are you sure that you were running your timer inside of my_save? Ensure future schedules a task, so you might have only been benchmarking the time it took to create that task, which then took more time running in the background.
Thanks @BobDotCom for testing this out!
I agree that there seems something amiss, I was probably really benchmarking creation instead of runtime and the bottleneck lay somewhere else. I'll try and revisit this at some point, but feel free to close the issue.