web3-dev-team icon indicating copy to clipboard operation
web3-dev-team copied to clipboard

IPFS GC & Lotus Splitstore

Open Stebalien opened this issue 4 years ago • 4 comments

See https://github.com/protocol/web3-dev-team/pull/120 & https://github.com/protocol/web3-dev-team/pull/8

NOTE: I'm discussing long-term solutions here, not short-term. Unfortunately, we'll likely have to go with a special-purpose solution due to time constraints, for now.

The solution-space for GC in the lotus splitstore is very similar to IPFS pinning/GC:

  1. We need a way open some form of "transaction" where anything touched in the transaction isn't garbage collected till we've had a chance to "pin" it.
  2. We need a way to unpin old tipsets.

Differences include:

  1. In the splitstore, we want to move unreferenced blocks instead of just deleting them.
  2. Lotus needs the ability to pin IPLD selectors (i.e., pin a tipset without pinning parents/sectors) while this is only a "nice to have" in go-ipfs.
  3. The latest go-ipfs GC/pinning proposal would have written a mapping for every pin/block pair, which would be way to slow to pin new tipsets. Prior IPFS GC proposals avoided this issue by assuming that the children of pinned blocks were already pinned (recursive pins). But that assumption doesn't hold with arbitrary selectors.

Difference 3 is the hardest one to reconcile, but not impossible, and go-ipfs would benefit significantly from such an optimization.

To do this generically, we'd need to track how many pinned parent blocks (or direct pins) reference a given block and it's children via a specific selector. Prior IPFS GC proposals left off the "via a specific selector" part.

Stebalien avatar Jun 29 '21 01:06 Stebalien

(not sure where else to put this)

Stebalien avatar Jun 29 '21 01:06 Stebalien

cc @vyzo, @raulk, & @gammazero.

I don't see this being the short-term solution given the complexity in "difference 3" (unless someone can think of a simple solution to that), but we should keep this in mind when designing the splitstore. Ideally the go-ipfs and lotus GC solutions would eventually converge.

Stebalien avatar Jun 29 '21 01:06 Stebalien

Some brief thoughts from conversations I've had recently:

Lotus needs the ability to pin IPLD selectors (i.e., pin a tipset without pinning parents/sectors) while this is only a "nice to have" in go-ipfs.

@warpfork recently was talking to me about a request from Ceramic for this type of behavior in go-ipfs.

Prior IPFS GC proposals avoided this issue by assuming that the children of pinned blocks were already pinned (recursive pins). But that assumption doesn't hold with arbitrary selectors.

The current mark + sweep GC would handle this just fine, although it'd still suffer from our existing problems. It might be worth noting that Peergos has been using their own version of mark + sweep that IIUC leverages transactions (note: both go-ds-leveldb and badger support transactions however they're currently unused within go-ipfs) to allow more parallelism (i.e. by tracking which blocks are in open transactions we know what can be avoided during GC) and side-step the global lock.

This means GC can still take a while if you have a lot of blocks, you still have to read + process all you pins every so often, and you still can't immediately remove a block, but you avoid the biggest pain in the process which is the inability to operate during GC.

aschmahmann avatar Jun 29 '21 04:06 aschmahmann

The current mark + sweep GC would handle this just fine, although it'd still suffer from our existing problems. It might be worth noting that Peergos has been using their own version of mark + sweep that IIUC leverages transactions (note: both go-ds-leveldb and badger support transactions however they're currently unused within go-ipfs) to allow more parallelism (i.e. by tracking which blocks are in open transactions we know what can be avoided during GC) and side-step the global lock.

This is basically how the splitstore currently works.

Stebalien avatar Jun 29 '21 15:06 Stebalien