BlockSci bug: transactions not correctly associated with addresses in v0.5

System Information (if applicable)

BlockSci version: 0.5 Using AMI: no Total memory: 32GB

auto a = getAddressFromString("3ChVP627KU5w4zu2rieFPF3wGXWQgmhvrs",chain->getAccess());
auto equivAddress = (*a).getEquivAddresses(false);
auto pointers = equivAddress.getOutputPointers() | ranges::to_vector;
std::cout << "Equiv Address: " << pointers.size() << " pointers" << std::endl;

----- Result -------- Equiv Address: 1 pointers

But according to blockchain.info address got 59 transactions https://www.blockchain.com/btc/address/3ChVP627KU5w4zu2rieFPF3wGXWQgmhvrs

I suspect I miss something regarding the type of the address as my parser folder seems sain.

Can someone help me with that ?

Thanlks in advance, Clément

May 08 '19 12:05 ClementPavue

Second line should be the following (is that just a typo in your example?):

auto equivAddress = a.getEquivAddresses(false);

Running this on 0.6 gives me Equiv Address: 54 pointers (on a slightly outdated state of the blockchain)

May 08 '19 15:05 maltemoeser

Yes sorry bad copy / paste ( I updated first message); So its a bug of 0.5 if I read you correctly ?

May 08 '19 15:05 ClementPavue

So its a bug of 0.5 if I read you correctly ?

No, I haven't been able to test this on v0.5 yet to see if it's reproducible.

May 08 '19 20:05 maltemoeser

Can reproduce this in v0.5. My best guess is that it's related to #217: if I retrieve the address through the output of a transaction, I get a different scriptNum than the one from getAddressFromString. However, this seems to be fixed in v0.6.

May 09 '19 12:05 maltemoeser

Made the same diagnosis as yours. I'm currently re-parsing chain to see if the bug is solve for me in 0.6

May 10 '19 13:05 ClementPavue

Hi. I encountered this bug on v0.5. The bug is also reproduced on the PUBKEYHASH addresses: 183hmJGRuTEi2YDCWy5iozY8rZtFwVgahM, 1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF, etc. (from the rich list).

>>> chain.address_from_string("183hmJGRuTEi2YDCWy5iozY8rZtFwVgahM").balance()
777
>>> chain.address_from_string("1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF").balance()
777

This is caused by the bug at AddressState::reloadBloomFilter in the parser:

reloadBloomFilter<blocksci::AddressType::MULTISIG_PUBKEY>() is called
reloadBloomFilter clears addressBloomFilter for blocksci::DedupAddressType::PUBKEY
reloadBloomFilter reloads addresses from db.db.getAddressRange<blocksci::AddressType::MULTISIG_PUBKEY>() instead of db.db.getAddressRange<blocksci::AddressType::PUBKEY>()
findAddress returns AddressLocation::NotFound for the existing address
resolveAddress assigns new addressNum to the existing address

In the same way for SCRIPTHASH, reloadBloomFilter<blocksci::AddressType::WITNESS_SCRIPTHASH>() is called.

It seems that the bug is fixed in v0.6 on 2bc7f72aa4f4a29c840411ab175feadc1d849e1e (sizeIncreaseRatio is also important. Fixing only type is resulting in massive reloading). But I would appreciate it if you could backport the patch to v0.5 since I think this bug causes seriously wrong analysis.

Aug 06 '19 07:08 ytoku

Unfortunately I didn’t understand the impact of this issue when it was brought up last year. I re-visited this now as I was preparing for a new release, and it indeed seems to impact all addresses that have been re-used after certain block heights.

As correctly identified by ytoku, the bug is caused by an incorrect reloading of the parser’s bloom filter at a specific block height (for Bitcoin: at height 580383 for Pubkey/Pubkey-Hash/Witness-Pubkey-Hash addresses, at height 572072 for Scripthash/Witness-Scripthash addresses). Addresses that were reused afterwards are not correctly associated with their previously assigned address ID, but instead received a new ID. This new ID is also returned by the address index.

Due to how the parser caches addresses, subsequent address occurrences could receive the old ID again, leading to two possibilities:

If the address had only been seen once before the incorrect reloading, subsequent uses would use the new ID. In this case, looking up the address by string would return an address that misses the first use.
If the address had been seen more than once before, subsequent occurrences would receive the old ID again. In this case, looking up the address by string would return an address that shows only a single use.

I’ve compiled some more details here. Users of the v0.6 branch (after Oct 2018) should not have been affected by this.

Jul 29 '20 17:07 maltemoeser