Documentation icon indicating copy to clipboard operation
Documentation copied to clipboard

Big File Support Spec

Open MuxZeroNet opened this issue 8 years ago • 4 comments

Please provide the specification for the big file support feature. If you don't know where to start, here are some example topics to write about.

How big files are hashed. How merkle trees are made.

Piece size, hashing algorithm, number of leaf nodes, etc.

How a big file is represented in content.json

Piece field format, hashing algorithm, keywords, etc.

Does big file support introduce changes to the network protocol?

Are big files transmitted over the current network protocol? Is the network protocol changed?

What are the preferred ways to store an incomplete big file?

How did you do it.

The status of the current specification.

Draft, pending review, recently revised, final version, etc.

MuxZeroNet avatar Oct 15 '17 20:10 MuxZeroNet

Relevant code snippets from BigfilePlugin.py

content["files_optional"][file_relative_path] = {
    "sha512": merkle_root,
    "size": upload_info["size"],
    "piecemap": piecemap_relative_path,
    "piece_size": piece_size
}
# ...
return {
    "merkle_root": merkle_root,
    "piece_num": len(piecemap_info["sha512_pieces"]),
    "piece_size": piece_size,
    "inner_path": inner_path
}
# ...
def hashBigfile(self, file_in, size, piece_size=1024 * 1024, file_out=None):
    # method source code...

Relevant code snippets from the unit tests.

merkle_root, piece_size, piecemap_info = site.content_manager.hashBigfile(...)

piecemap_info["sha512_pieces"][0].encode("hex")

msgpack.pack({file_name: piecemap_info}, stream)

assert file_node["piecemap"] == inner_path + ".piecemap.msgpack"

assert piecemap["sha512_pieces"][0].encode("hex") == \
"a73abad9992b3d0b672d0c2a292046695d31bebdcb1e150c8410bbe7c972eff3"

MuxZeroNet avatar Nov 15 '17 04:11 MuxZeroNet

Thanks for the suggestions it really helped :) I have added the informations to the docs: https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-plugin

How big files are hashed. How merkle trees are made.

https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-merkle-root

How a big file is represented in content.json

https://zeronet.readthedocs.io/en/latest/help_zeronet/network_protocol/#bigfile-piecemap

Does big file support introduce changes to the network protocol?

Yes, getPieceFields and setPieceFields

What are the preferred ways to store an incomplete big file?

The ZeroNet client creates a sparse file with the final size of the big file at the beginning of the download process. (requires to use fsutil command on windows)

The status of the current specification.

It's already implemented and being used, but nothing is written in stone :)

HelloZeroNet avatar Nov 17 '17 02:11 HelloZeroNet

Packed format: Turns the string to an list of int by counting the repeating characters starting with 1. Example: 1110000001 to [3, 6, 1], 0000000001 to [0, 9, 1], 1111111111 to [10]

Checker-board pattern?

Y N Y
3 6 1
Y N Y
0 9 1
Y
10

MuxZeroNet avatar Nov 18 '17 02:11 MuxZeroNet

Yes it assumes most of picefield will not be fragmented and users will download pieces in batches.

HelloZeroNet avatar Nov 18 '17 09:11 HelloZeroNet