Non-linear search for payment credentials by duplicating the address column

Open nielstron opened this issue 3 years ago • 0 comments

Describe your idea, in simple words.

Currently, search the UTxO set for UTxOs by payment credential is slow (~2s on 1GB of data) due to it being performed as linear search by sqlite. We can improve on that.

Currently kupo serializes addresses internally like this

     ┏━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓
     ┃ tag │ delegation credentials │ header │ payment credentials ┃
     ┗━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛

which makes total sense if we only want to leverage SQLite indices for delegation credentials. We can improve on this to efficiently search for payment credential usage with a simple trick: Introduce address_rev(or a similar name) with this layout

     ┏━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓
     ┃ tag │ payment credentials │ header │ delegation credentials ┃
     ┗━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛

Just switches the credentials around... Add an index and voila!

Why is it a good idea?

In the case that users want to query the UTxO set by payment credentials this is useful. I know of at least two handy use cases of this. This will dramatically speed up queries in this use case.

Potential problems: We are increasing the size of the DB too. How much? It seems quite significant trying around with it for a bit (the size of kupo.sqlite3 on testnet is 3.1GB vs 2.4GB after vacuum. timings for queries TBD) . See #58 for an alternative idea

Are you willing to work on it yourself?

Yes

Aug 11 '22 21:08 nielstron