WIP: Archive index design
WIP: Archive index design
Background
For the purposes of this discussion, an IPLD Store is a Mapping of CID to (binary) data[^1].
The industry standard for Mappings of this scale is to use a key-value database. Databases may be embedded (co-resident) or client-server (maybe co-resident).
[^1]: Breaking this assumption would unlock using graph databases. I think we should look into that at some point.
Motivate and problem statement
Archival nodes need an IPLD Store for the entire history of a chain.
For mainnet, at time of writing this is O(50TiB), with O(50G) CIDs.
A key selling point for Forest is viability as an archival node on commodity hardware.
This is currently implemented using many .forest.car.zst files, but lookup performance is relatively poor, so we should consider other solutions for beta users.
Evaluation criteria
- Disk usage
- Incremental updates
- Unlike e.g snapshot users, archive users aren't going to want to rebuild their database every night.
- Convenience
- Import/export tooling
- Compatability with e.g snapshot files
- Lookup latency
- Implementation complexity
Existing IPLD Stores on Filecoin nodes
Every Filecoin node needs an IPLD Store.
Forest engineers have experience with the following IPLD stores.
They are all embedded, presumably because:
- embedded is simpler/respoects the node boundary.
- embedded suits this scale of database.
Rocksdb
erDiagram
rocksdb-ipldstore {
u8[] cid PK
u8[] data
}
ParityDB
erDiagram
paritydb-ipldstore {
u8[] cid PK
u8[] data
}
.forest.car.zst files
erDiagram
cid one -- one or more inline-index: hash
inline-index {
u64 hash
u64 offset
}
inline-index one -- one file: lookup
file {
u64 offset PK
u8[] data
}
Candidates
MySQL
Datatypes cheat sheet:
| MySQL | Rust (unsigned) |
|---|---|
TINYINT |
u8 |
SMALLINT |
u16 |
MEDIUMINT |
u24[^2] |
INT |
u32 |
BIGINT |
u64 |
[^2]: not a core primitive type
Aside: storing CIDs
Which scales better?
Here is a faithful translation of CidGeneric<64>, the default cid::Cid
erDiagram
faithful-cid64 {
ENUM version PK
BIGINT cid_codec PK
BIGINT hash_code PK
VARBINARY(64) hash_digest PK
}
Most hashes look like this
Keep a table for each variant
erDiagram
base32-cidv1-raw-blake2b-256 {
BINARY(32) digest PK
}
(or somewhere in between)
consideration: what would require code changes?
Custom Binary Search
Custom Hash Table
SQLite3
Evaluation
Evaluation at mainnet scale
| Solution | Disk usage | Latency | Inc | Convenience | Complexity |
|---|