bdk Disconnect blocks from `Wallet`

Description

In working on bdk_kyoto, we are finding that most of the functionality exists for the Wallet to just directly interoperate with the light client without much hassle. I am now just considering a couple snags that would be resolved by this PR.

Context

The light client emits potentially sparse but monotonically increasing blocks along with their height. These may be applied to Wallet with apply_block_connected_to, using the last local chain tip in the usual case.

If blocks are reorganized, the light client will emit the disconnected block header(s) along with their height(s).

Block Reorganizations

It is not intuitively clear to me how one would handle reorganizations from the Wallet API standpoint. One option is to use apply_block_connected_to with the last block that wasn't reorganized, but I struggled with reasoning through that. Within bdk_kyoto, we keep a LocalChain and apply ChangeSet to it, making the process much simpler by passing None for the height that was disconnected. I think this is a far more intuitive to use None here and apply a ChangeSet.

Allowing for this introduces a nice invariant, that one could always use the tip of the LocalChain when calling apply_block_connected_to as long as the reorg ChangeSet is applied first

Batch chain updates

insert_checkpoint is fine, however the light client will emit the 10 most recent headers when synced to the chain of most work. Inserting them as a batch via a ChangeSet offers a convenience that I suspect may also improve the RPC in the future when blocks are filtered out by CBF.

tl,dr: pls allow altering LocalChain in Wallet

Checklists

All Submissions:

[x] I've signed all my commits
[x] I followed the contribution guidelines
[x] I ran cargo fmt and cargo clippy before committing

New Features:

[x] I've added tests for the new feature
[x] I've added docs for the new feature

Bugfixes:

[ ] This pull request breaks the existing API
[ ] I've added tests to reproduce the issue which are now passing
[ ] I'm linking the issue being fixed by this PR

Oct 19 '24 18:10 rustaceanrob

~~Am I correct in understanding that this change would allow you to remove LocalChain from bdk_kyoto and just directly query and update the wallet's chain?~~ Nevermind above, but I am wondering if there's a way to remove LocalChain from bdk_kyoto.

Oct 20 '24 19:10 notmandatory

Just my opinion, I would prefer a method that explicitly removed blocks

/// Disconnect the given `block` and all blocks that came after it...
pub fn disconnect_block(&mut self, block: BlockId) -> bool {
    if let Ok(changeset) = self.chain.disconnect_from(block) {
        if !changeset.is_empty() {
            self.stage.merge(changeset.into());
            return true;
        }
    }
    false
}

instead of apply_local_chain_changeset which could lead to ambiguity of the API around how blocks may be connected, which is normally done by applying a wallet Update. We shouldn't allow a wallet user to reach for this method as a way of connecting blocks, and I don't think a local_chain::ChangeSet is something a wallet user should normally encounter or try to construct.

the light client will emit the 10 most recent headers when synced to the chain of most work.

side note: This is generally posed as a kind of reorg protection, but do we actually need all ten headers if we're able to respond to block-disconnect events as they occur?

Oct 20 '24 21:10 ValuedMammal

Nevermind above, but I am wondering if there's a way to remove LocalChain from bdk_kyoto.

Maybe, but either this approach or adding a disconnect_block would allow for even more flexibility. Anyone doing work in Rust can just use bdk_wallet and kyoto out of the box and it works without a hitch.

Oct 20 '24 21:10 rustaceanrob

Just my opinion, I would prefer a method that explicitly removed blocks

Sure, that makes sense. I wasn't sure how public the ChangeSet are meant to be in general. Certainly is more intuitive to have insert and disconnect APIs. I can change this PR to that approach.

Oct 20 '24 21:10 rustaceanrob

I like @ValuedMammal 's suggestion since it clearer when/why you'd use this function. The new wallet test would then look like below. One nit, could we call this function disconnect_checkpoint() so it'd be clear it's related to insert_checkpoint ?

#[test]
fn test_localchain_changeset_applies() {
    let (mut wallet, _txid) = get_funded_wallet_wpkh();
    // We reorg a block
    let checkpoint = wallet.local_chain().get(2_000).expect("existing block");
    assert!(wallet.disconnect_block(checkpoint.block_id()));
    let wallet_tip = wallet.local_chain().tip().block_id();
    assert_ne!(wallet_tip.height, 2_000);
    // Then we add some back
    wallet.insert_checkpoint(BlockId::from((2_000, BlockHash::all_zeros()))).expect("valid block");
    wallet.insert_checkpoint(BlockId::from((3_000, BlockHash::all_zeros()))).expect("valid block");
    let wallet_tip = wallet.local_chain().tip().block_id();
    assert_eq!(wallet_tip.height, 3_000);
}

Oct 20 '24 22:10 notmandatory

Okay, updated with those suggestions.

Oct 21 '24 00:10 rustaceanrob

Thanks for catching those. Updated again

Oct 21 '24 16:10 rustaceanrob

ACK 76876dd2f749d6a5dca00c6570ce13c3e82477c9

Oct 23 '24 08:10 LagginTimes

In reference to the comment, there is some reason to believe, although no certainty that the new block a transaction confirms in will be connected at the same height as the reorganized block. kyoto (and potentially bitcoind_rpc with the new filtering PR) would only emit blocks of relevance, so I am uncertain if this would cause problems in the following example:

Block A confirms at height 10 Block B confirms at height 11 with a user TX -> Wallet::apply_block_relevant is called Block B is disconnected by block C at height 11 and D at height 12 with D containing the TX -> Wallet::apply_block_relevant is called on D

What happens here? With Wallet::disconnect_block called on B before D is applied, there is no ambiguity.

From my perspective this is simply the compliment method to a method that already exists, Wallet::insert_checkpoint. Either both should exist or a serious improvement in the documentation on insert_checkpoint needs to occur. In projects that may simply use kyoto and bdk_wallet without bdk_kyoto, I am either very confused or simply blocked by not being able to manually connect or disconnect blocks, which is their preferred workflow (see Listen and Confirm traits)

Oct 29 '24 14:10 rustaceanrob

I still strongly think we shouldn't have this API and should remove bitcoindevkit/bdk#1276. The underlying system we are modeling is a directed acyclic graph and it doesn't have a delete operation. In reality, you can only grow the graph along one path to change the tip and "disconnect" other blocks from the main chain. I think our data structures should reflect this. By not modelling the chain precisely we lose the ability to do bitcoindevkit/bdk_wallet#148 and therefore query what the wallet's balance/utxo set was at any point int he graph.

In reference to the comment, there is some reason to believe, although no certainty that the new block a transaction confirms in will be connected at the same height as the reorganized block. kyoto (and potentially bitcoind_rpc with the new filtering PR) would only emit blocks of relevance, so I am uncertain if this would cause problems in the following example:

Block A confirms at height 10 Block B confirms at height 11 with a user TX -> Wallet::apply_block_relevant is called Block B is disconnected by block C at height 11 and D at height 12 with D containing the TX -> Wallet::apply_block_relevant is called on D

What happens here? With Wallet::disconnect_block called on B before D is applied, there is no ambiguity.

There is no ambiguity even without calling disconnect block. Wallet::apply_block with block C will make it canonical instead of block B. When bitcoindevkit/bdk_wallet#148 is done then block C will only become canonical if it has more PoW than block B -- if not it will become canonical when D is added (which will point to C).

From my perspective this is simply the compliment method to a method that already exists, Wallet::insert_checkpoint. Either both should exist or a serious improvement in the documentation on insert_checkpoint needs to occur. In projects that may simply use kyoto and bdk_wallet without bdk_kyoto, I am either very confused or simply blocked by not being able to manually connect or disconnect blocks, which is their preferred workflow (see Listen and Confirm traits)

I think you're right that insert_checkpoint is confusing since it doesn't say which block this is connected to. I don't know why this method exists either. I have a feeling it was there to make building up a test scenario easy but leaked into the API. We should remove it.

Oct 29 '24 23:10 LLFourn

As a preface, in this issue, (from what I understand) you've expressed it is desirable to handle blocks directly, and I agree. That is why I would like for Wallet to handle blocks and reorganizations from arbitrary sources.

Let me expand on the example. Just to reiterate:

Block A confirms at height 10 Block B confirms at height 11 with a user TX -> Wallet::apply_block_connected_to is called Block B is disconnected by block C at height 11 and D at height 12 with D containing the TX -> Wallet::apply_block_connected_to is called on D

I hope we are in accordance that a compact block filter client, or an efficient implementation of an RPC client, does not - and should not - emit every block. Only blocks with relevant scripts are emitted from such a system. Here is the problem in the example I am trying to express. Let's say the client started with A as a checkpoint. The user syncs to block B, and block B will be emitted as it is relevant. The user then closes the app and opens it later. At that time the system informs the user that B is no longer a member of the chain of most work. At no point will block C be emitted, as the relevant transaction came in block D. Block D will be emitted of course.

In the Wallet internal representation of the chain, B is connected to A, and D is connected to (?). I haven't tested this scenario, but my thinking is there would be an error in the best case and B is incorrectly represented as part of the chain in the worst case.

The underlying system we are modeling is a directed acyclic graph and it doesn't have a delete operation. In reality, you can only grow the graph along one path to change the tip and "disconnect" other blocks from the main chain.

Right, so if an oracle of such a graph has informed you that a node is no longer along the chain of most work, what is the issue with calling a disconnect function? I will also add that, although true, this graph in practice has never had an edge longer than length 1 that diverts from the path of most weight. There is no incentive to build on lower work edges, so if a different edge has been extended by more work, what is the harm in removing an irrelevant edge?

edit: Final comment, I think it's still possible to make it work with the current API, I'm just not clear on what exactly the danger of this method is. Mostly referring to the graph representation where low work edges can be pretty reasonably removed.

Oct 30 '24 02:10 rustaceanrob

As a preface, in this issue, (from what I understand) you've expressed it is desirable to handle blocks directly, and I agree. That is why I would like for Wallet to handle blocks and reorganizations from arbitrary sources.

Let me expand on the example. Just to reiterate:

Block A confirms at height 10 Block B confirms at height 11 with a user TX -> Wallet::apply_block_connected_to is called Block B is disconnected by block C at height 11 and D at height 12 with D containing the TX -> Wallet::apply_block_connected_to is called on D

I hope we are in accordance that a compact block filter client, or an efficient implementation of an RPC client, does not - and should not - emit every block. Only blocks with relevant scripts are emitted from such a system. Here is the problem in the example I am trying to express. Let's say the client started with A as a checkpoint. The user syncs to block B, and block B will be emitted as it is relevant. The user then closes the app and opens it later. At that time the system informs the user that B is no longer a member of the chain of most work. At no point will block C be emitted, as the relevant transaction came in block D. Block D will be emitted of course.

Right I understand what you are trying to tackle now. Thanks. kyoto will need to tell what block that is has previously emitted that D is connected to. If it's not connected to any block kyoto has previously emitted it should use one from the chain the user has passed to it at startup (we do this is in the bdk rpc crate). Right now this will mean it will have to emit C (to contradict B) and A (the ancestor of C and B) which is where it actually connects to the user's existing chain. When bitcoindevkit/bdk_wallet#148 is done it will only have to emit A and C can be orphaned implicitly.

The semantics of "disconnect block" is really to say that the next block I'm emitting isn't connected to the "disconnected" one. The correct way to deal with this kind of API is not by disconnecting a block from the wallet but just with some state interpreting each event that kyoto outputs and trying to figure out implicitly which block the next block event is connected to. But it's simpler if rather than asserting disconnections kyoto just asserted which block the new block is connected to. To be clear before bitcoindevkit/bdk_wallet#148 this means just turning every disconnect block event into a block emitting event with the new block at that height (without txdata if it's not relevant).

Another point is that you might want to actually emit an event for every single block, but just don't include the tx data if it's not relevant. The wallet would need this to know when relative timelocks are satisfied and coinbases are mature etc.

Like @evanlinjin mentioned the actual solution here is just not to do any of this at all. Kyoto should just be a ChainOracle. There's no real need to keep two chain states going here and keeping them sync'd up.

Right, so if an oracle of such a graph has informed you that a node is no longer along the chain of most work, what is the issue with calling a disconnect function? I will also add that, although true, this graph in practice has never had an edge longer than length 1 that diverts from the path of most weight.

There is no real way to use this observation to make the problem of maintaining a model of a DAG any simpler unfortunately. See here for a real world example of the problem with trying to model a graph as a stack:

https://x.com/evoskuil/status/1849158276065464602

Oct 30 '24 04:10 LLFourn

Right now this will mean it will have to emit C (to contradict B) and A (the ancestor of C and B)

Ideally, the node does not have to manage state as to what blocks were previously emitted. What you are describing is the disconnected blocks could also be emitted along with their replacements, or simply with the replacements alone. This is the exact opposite information that ldk-node needs, and my personal take is emitting the blocks that were made irrelevant is much easier to understand.

If I were to try to integrate with ldk-node right now and I encountered a reorganization, I would get a checkpoint iterator, filter on blocks that were not removed, and get the new tip. With the new tip I would call Wallet::apply_block_connected_to. Then indeed, I would be extending a new edge of the graph from the correct node, but at the cost of strange semantics.

We don't have to go through with this PR, but I am not eager to change the light client public facing API when it would contradict the direction that other consuming libraries have taken.

Oct 30 '24 04:10 rustaceanrob

Hate to beat a dead horse here but I am sort-of coming around to this model. I'm just struggling now to understand what the use of one of these dead edges in the graph would be. If we take great care to make sure every edge we've seen remains in the graph, what is the advantage of that?

Oct 30 '24 17:10 rustaceanrob

If we take great care to make sure every edge we've seen remains in the graph, what is the advantage of that?

Eventually we want to make changes to local chain (likely some new type) monotone so that changes can be applied in any order and you get the same result. In the above example that assumption is violated because it depends on the order of the changes applied. Not sure if that answers the question but this has helped me in thinking about it.

Oct 30 '24 18:10 ValuedMammal

Sure, so having stricter topological requirements would enable more flexibility from an API point of view. Yeah, I mean I can punt on this for now.

Oct 30 '24 18:10 rustaceanrob