[Question] Confused about a sentence in section 6.2 (Parsing Tree)
Hello, in section 6.2 you've mentioned the following
Unlike tags and commits, tree objects are binary objects, but their format is actually quite simple. A tree is the concatenation of records of the format:
[mode] space [path] 0x00 [sha-1]
I'm confused with this particular sentence: Unlike tags and commits, tree objects are binary objects.
As I understand all the objects: blob, tags, commits are all stored as a binary file represented by a hash. A blob contains the content of a file A tag contains a reference to a commit A commit contains multi-line key value pair of tree, author and etc.. A tree object contains multiple lines of references to other trees / blob in the work tree
Why is there a need to specifically point out that: tree objects are binary objects? When all objects are stored as binary file.
It is indeed the case that every object in a repository is stored in a binary format, because of the zlib compression. What this sentence means is that the (uncompressed) format of commits and tags is a text-based format (RFC2822), where an uncompressed tree is binary. IOW, a commit decompresses to text.
Another way of putting this is that file-level compression is an implementation detail (git does use other storage mechanisms, like packfiles). The SHA identity of an object is computed on its uncompressed contents.
If I understand it correctly, here are the uncompressed formats of each objects:
commit - text tags - text tree - binary blob - binary / text
After compression via zlib, all these objects will be stored in a binary format.
Will it be clearer if the sentence was updated to something similar like the following as there's explicit indication of different formats:
Unlike tags and commits which use a text-based format, tree objects use a binary-based format, but its format is actually quite simple ...