plugin-sdk icon indicating copy to clipboard operation
plugin-sdk copied to clipboard

feat: Sort state client cache keys before flush

Open askurydzin opened this issue 1 year ago • 9 comments

Summary

in some concurrent scenarios several cache keys can be updated by several routines, thus, in order to avoid the deadlocks (in postgresql for example) we should update keys in sorted order.

askurydzin avatar Aug 02 '24 19:08 askurydzin

Hi @askurydzin 👋 Thanks for the PR. Do you mind opening an issue reproducing the scenario you're experiencing? That would help us understand the fix better. You can open the issue in our main repository.

Alternatively can you add a unit test that's failing without this fix (e.g. one that uses the state client and has a deadlock in Postgres)?

erezrokah avatar Aug 05 '24 07:08 erezrokah

Hello, @erezrokah! It would be quite hard to reproduce it as it requires writing some demo plugin from scratch which uses state client. Would it be fine if I just briefly explain why primary keys for db needed to be sorted during insert/update?

When it goes about two concurrent transaction T1 with set of keys { K1, K2, K3 } and T2 with set of keys T2 { K2, K3, K4 }, and we don't sort keys, following issue happens in postgres destination:

T1 BEGIN T2 BEGIN
INSERT K1 (exclusive-lock for K1) INSERT K2 (exclusive-lock for K2)
INSERT K2 (wait share-lock for K1 / locked by K2) INSERT K1 (wait share-lock for K1 / locked by K1)

In the end both transactions appear to be in a dead-lock state.

So, sorting state table primary keys before inserting allows to avoid such situations.

askurydzin avatar Aug 05 '24 10:08 askurydzin

It would be quite hard to reproduce it as it requires writing some demo plugin from scratch which uses state client. Would it be fine if I just briefly explain why primary keys for db needed to be sorted during insert/update?

Is the issue not reproducible on any of our existing plugins? I think it's better to have a reproduction otherwise we can't confirm the fix is working.

erezrokah avatar Aug 05 '24 10:08 erezrokah

@erezrokah no, I think it's not reproducible in any of existing plugins as I'm not sure if any of existing plugins (that I may have an access to) use state client. I stumbled across the issue when writing my own plugin (it's private). The issue is quite straight-forward, but if existence of the plugin vulnerable to the issue is the case for this SDK, I cannot write some demo plugin because it wouldn't be an existing plugin per se, and it may seem too artificial example.

Hence, closing the PR. Sorry for wasting your time.

askurydzin avatar Aug 05 '24 15:08 askurydzin

Hence, closing the PR. Sorry for wasting your time.

Not wasting anyone's time that's for sure. We appreciate the feedback.

There's a public plugin with a state client here https://github.com/cloudquery/cloudquery/blob/e94dda546631c5ef801852bbae997ad11f615f2e/plugins/source/hackernews/resources/services/items/items_fetch.go#L34

To clarify the request is not for one of our own plugins to reproduce this, but to have a test that fails without the fix, and passes with the fix, or a minimal reproduction scenario that shows the bug (maybe a very trimmed down version of your own plugin without any private code).

erezrokah avatar Aug 05 '24 15:08 erezrokah

@erezrokah Hm, if the run of the mentioned plugin consistently emits the set of keys that may overlap, I think running it in parallel may reproduce the thing, then give me some time and I'll check if it work.

askurydzin avatar Aug 05 '24 15:08 askurydzin

Hi @askurydzin, where you able to write a unit test that reproduces the issue?

erezrokah avatar Oct 01 '24 08:10 erezrokah

@erezrokah sorry, not yet, will provide steps to reproduce this week.

askurydzin avatar Oct 01 '24 16:10 askurydzin

@erezrokah I've prepared a repo with demo plugin with somewhat ridiculous logic, but it's aimed to demonstrate the deadlocks occurring with state-table overlapping keys: https://github.com/askurydzin/cq-source-state-deadlock

askurydzin avatar Oct 02 '24 12:10 askurydzin