Performance tuning suggestions
Since it seems like more people have joined the "customize git commit hashes because why not" bandwagon (welcome!), I figured I'd share some insights from doing a lot of performance tuning on my own implementation from a few years ago. Obviously, none of this should be taken to discourage new implementations or to ruin any fun.
In decreasing order of effectiveness, the main things that helped were:
- Running on a GPU (this improved performance by roughly a factor of 10, depending on hardware)
- Caching SHA1 buffer state. The tl;dr is that if you put the nonce at the end of your commit message (or signature, if applicable), you can cache the state of the SHA1 algorithm as applied to all but the last ~64 bytes of the commit, and then reapply the cached state to the last 64 bytes on each attempt. This improved performance by a factor of about 4-5 or more, depending on how big the commit is.
- If the nonce is always the same length, then the trailer that the SHA1 puts at the end of data when hashing is also effectively fixed, and I was able to improve the performance of
lucky-commitby about 25% by not recomputing it each time.
(Also mentioning @mkrasnitski from https://github.com/mkrasnitski/git-power-rs)
Thanks for the optimization tricks! Glad to see other people have had similar ideas-- it seemed pretty obvious when I wrote it.
I'll have to look into caching SHA1 state and brute forcing from the end. Currently, the nonces are stored about halfway through the data (example here) so that would likely be a significant speedup.
As for a GPU implementation, that would be significantly faster. I haven't had time to implement that yet and wasn't sure if it was worth the effort. Seeing that you've done it for basically the same problem is certainly good news though.
If the goal is to hide our changes to the commit, wouldn't this make it unviable to append to the end of a commit? The nonce will appear in the commit message this way, and even extra whitespace is technically detectable (if non-standard whitespace chars are used). This would also not be compatible with signed commits, where we are basically forced to put the nonce in the middle of the commit data.
EDIT: Thinking a bit further, it may still be worth it to cache the state of the SHA-1 buffer up to the location of the nonce, so basically just the commit header.
If the goal is to hide our changes to the commit, wouldn't this make it unviable to append to the end of a commit? The nonce will appear in the commit message this way, and even extra whitespace is technically detectable (if non-standard whitespace chars are used).
The way I implemented this was to append a combination of spaces and tabs to the end of the commit message. This is technically user-visible, but I haven't found it to be a noticeable problem in practice (e.g. GitHub automatically trims commit messages when displaying them in the web UI).
This would also not be compatible with signed commits, where we are basically forced to put the nonce in the middle of the commit data.
You're right about signed commits -- for that case I put the whitespace at the end of the signature (and cache up to that point). This reduces performance compared to the no-signature case, but is still much faster than not caching at all, or putting the whitespace at the beginning of the signature.
Thinking a bit further, it may still be worth it to cache the state of the SHA-1 buffer up to the location of the nonce
Yes, this is a good idea. To ensure that the number of 64-byte blocks that need to be rehashed on each attempt is minimal, I also add padding before the nonce so that the nonce starts at an offset of 64 bytes from the start of the commit header (the full padding format is described here). My nonces are 48 bytes long (to allow for enough entropy due to only containing spaces and tabs), so this might be less likely to be necessary when using shorter nonces.