tortoise-orm icon indicating copy to clipboard operation
tortoise-orm copied to clipboard

Question: why are sqlite transactions in asyncio tasks so much faster?

Open trifle opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. Hi, thanks for tortoise! I've had a lot of fun using it so far.

There is a known issue with the postgres backend where transactions cannot be used in tasks, because tasks copy the context and therefore create stale connection objects (https://github.com/tortoise/tortoise-orm/issues/743).

That is, you cannot do:

async def my_save(item):
    async with in_transaction() as conn:
        Model(item).save(using_db=conn)

item = ...
asyncio.ensure_future(my_save(item)) # <- this creates a future 
# which copies context and will raise InterfaceError: cannot call Connection.fetch()

However! You can do the same with sqlite, supposedly because its in-process design has no concept of connections.

I've tested a simple sqlite setup with two ways of concurrent inserts (single rows, no batch):

async def insert1(item):
    asyncio.ensure_future(my_save(item))

async def insert2(item):
    await my_save(item)

The first version (insert1) is a lot faster (I don't have a reproducible benchmark script, but it seems it's an order of magnitude faster). Note that there are no collisions or exceptions.

Describe the solution you'd like

  • I'd like to understand why there even is a difference. Both coroutines should essentially do the same, which is to allow other concurrent transactions to execute.
  • Ideally, I'd like to be able to use tasks for postgres as well (open issue here https://github.com/tortoise/tortoise-orm/issues/743).
  • I'd like to understand whether I'm breaking any consistency guarantees of sqlite by using tasks?

Describe alternatives you've considered Well, it's a performance-related question, not strictly a bug. So using the current version of tortoise is fine, it's just perhaps slower than it could be.

Additional context I'm using tortoise in an event-driven system that should scale up to millions of coroutines and tens of thousands of simultaneous transactions. Ideally, all of those are completely independent, meaning I'd benefit from a massively concurrent codepath down to the DB. I'd also like to use signals, that's why bulk inserts are out of the question.

trifle avatar Mar 18 '22 10:03 trifle

I'm unable to reproduce this, my ensure_future latency is actually slightly higher than with await.

My results:

Getting average read/write latency for 1000 iterations...
---------------------------------------------------------
ensure_future average:   0.705ms (read) | 0.343ms (write)
await average:           0.690ms (read) | 0.344ms (write)
---------------------------------------------------------
Finished benchmarks, took 6.263s

Are you sure that you were running your timer inside of my_save? Ensure future schedules a task, so you might have only been benchmarking the time it took to create that task, which then took more time running in the background.

BobDotCom avatar May 18 '22 04:05 BobDotCom

Thanks @BobDotCom for testing this out!

I agree that there seems something amiss, I was probably really benchmarking creation instead of runtime and the bottleneck lay somewhere else. I'll try and revisit this at some point, but feel free to close the issue.

trifle avatar May 18 '22 06:05 trifle