graph-node Add a `graphman prune` command

The command will have the form graphman prune <subgraph> <offset> and remove all entity versions that were deleted or updated at least offset many blocks before the current subgraph head; for a pruned subgraph, the index-node API will report that block number as the earliest bock, and queries at a block height before that block will fail with an error. Pruning will not affect query results for queries at a block after the pruned block, and therefore simply limits how far back time-travel queries can reach.

Since pruning removes a huge amount of data that is usually not accessed by queries, it speeds up queries significantly.

As part of this issue, pruning will be a one-time action, i.e., it only removes history at the point in time when it is run.

Jun 15 '22 17:06 lutter

@lutter on the "earliest block", I wonder if that might break some application's assumptions (which might rely on that for example to identify how synced a subgraph is)

Jun 16 '22 11:06 azf20

@lutter on the "earliest block", I wonder if that might break some application's assumptions (which might rely on that for example to identify how synced a subgraph is)

The earliest block number would still be accurate, it's just that the hash for the earliest block gets filled with a dummy value. Right now, that value is always the start block for the subgraph; with pruning the block number could/would move up over time. But I have no idea if anybody is relying on the block hash for anything.

Jun 16 '22 19:06 lutter