Non-linear search for payment credentials by duplicating the address column
Describe your idea, in simple words.
Currently, search the UTxO set for UTxOs by payment credential is slow (~2s on 1GB of data) due to it being performed as linear search by sqlite. We can improve on that.
Currently kupo serializes addresses internally like this
┏━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓
┃ tag │ delegation credentials │ header │ payment credentials ┃
┗━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛
which makes total sense if we only want to leverage SQLite indices for delegation credentials.
We can improve on this to efficiently search for payment credential usage with a simple trick:
Introduce address_rev(or a similar name) with this layout
┏━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓
┃ tag │ payment credentials │ header │ delegation credentials ┃
┗━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛
Just switches the credentials around... Add an index and voila!
Why is it a good idea?
In the case that users want to query the UTxO set by payment credentials this is useful. I know of at least two handy use cases of this. This will dramatically speed up queries in this use case.
Potential problems: We are increasing the size of the DB too. How much? It seems quite significant trying around with it for a bit (the size of kupo.sqlite3 on testnet is 3.1GB vs 2.4GB after vacuum. timings for queries TBD) . See #58 for an alternative idea
Are you willing to work on it yourself?
Yes