write-yourself-a-git icon indicating copy to clipboard operation
write-yourself-a-git copied to clipboard

[Question] Confused about a sentence in section 6.2 (Parsing Tree)

Open tanshunyuan opened this issue 11 months ago • 2 comments

Hello, in section 6.2 you've mentioned the following

Unlike tags and commits, tree objects are binary objects, but their format is actually quite simple. A tree is the concatenation of records of the format:

[mode] space [path] 0x00 [sha-1]

I'm confused with this particular sentence: Unlike tags and commits, tree objects are binary objects.

As I understand all the objects: blob, tags, commits are all stored as a binary file represented by a hash. A blob contains the content of a file A tag contains a reference to a commit A commit contains multi-line key value pair of tree, author and etc.. A tree object contains multiple lines of references to other trees / blob in the work tree

Why is there a need to specifically point out that: tree objects are binary objects? When all objects are stored as binary file.

tanshunyuan avatar Mar 05 '25 04:03 tanshunyuan

It is indeed the case that every object in a repository is stored in a binary format, because of the zlib compression. What this sentence means is that the (uncompressed) format of commits and tags is a text-based format (RFC2822), where an uncompressed tree is binary. IOW, a commit decompresses to text.

Another way of putting this is that file-level compression is an implementation detail (git does use other storage mechanisms, like packfiles). The SHA identity of an object is computed on its uncompressed contents.

thblt avatar Mar 05 '25 06:03 thblt

If I understand it correctly, here are the uncompressed formats of each objects:

commit - text tags - text tree - binary blob - binary / text

After compression via zlib, all these objects will be stored in a binary format.

Will it be clearer if the sentence was updated to something similar like the following as there's explicit indication of different formats:

Unlike tags and commits which use a text-based format, tree objects use a binary-based format, but its format is actually quite simple ...

tanshunyuan avatar Mar 06 '25 02:03 tanshunyuan