PyBC icon indicating copy to clipboard operation
PyBC copied to clipboard

CVarInt format is not de-serialised correctly

Open garethjns opened this issue 7 years ago • 2 comments

The core wallet serialises larger (?) integers into the CVarInt format which is not read correctly by pybit.py3.common.Common. Any time a CVarInt is parsed, it returns as a massive number that breaks everything downstream of it.

https://bitcoin.stackexchange.com/questions/51620/cvarint-serialization-format

    def read_var(self,
                 pr: bool=False) -> bytes:
        """
        Read next variable length input. These are described in specifiction:
        https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer
        Retuns output and number of steps taken by cursor
        """
        # For debugging
        start = self.cursor

        # Get the next byte
   # CVarInt needs to be detected here?
        by = self.read_next(1)
        o = ord(by)

        if pr:
            print(by)

        if o < 253:
            # Return as is
            # by is already int here
            out = by

        elif o == 253:  # 0xfd
            # Read next 2 bytes
            # Reverse endedness
            # Convert to int in base 16
            out = self.read_next(2)

        elif o == 254:  # 0xfe
            # Read next 4 bytes, convert as above
            out = self.read_next(4)

        elif o == 255:  # 0xff
            # Read next 8 bytes, convert as above
            out = self.read_next(8)

        if pr:
            print(int(codecs.encode(out[::-1], "hex"), 16))

        return out

garethjns avatar Feb 11 '19 20:02 garethjns

Two things that may help:

  1. There are two different int compression methods : CCompactSize and CVarInt. I think what your want is the first one. CVarInt is not used in block dat files.

See Peter Wuille's answer here: https://bitcoin.stackexchange.com/questions/51620/cvarint-serialization-format Saying : "Still wrong. He's asking about the CVarInt used in UTXOs internally in Bitcoin Core. Not the variable-length push in scripts, or the CCompactLen used in the P2P protocol."

  1. I think the problem is that .dat files may be corrupted.

See Peter Wuille's answer here.

https://bitcoin.stackexchange.com/questions/86159/differences-to-ccompactsize-and-cvarint

"Blocks have well defined data only. But the files can contain garbage in addition to the blocks. There is an index database that stores which position each block is stored at."

I think the right solution would be to parse LevelDB files inside directory index/ to get exact location of blocks in .dat files.

You can get offset in file using thiese db : https://bitcoin.stackexchange.com/questions/28168/what-are-the-keys-used-in-the-blockchain-leveldb-ie-what-are-the-keyvalue-pair

(be careful at the end of the thread though, it is misleading)

papipig avatar Nov 13 '21 21:11 papipig

Hi @papipig, thank you very much for the info and links!

garethjns avatar Nov 24 '21 18:11 garethjns