Bisected regression runs genOffsetHash leading to wasteful loop when processing ofs-delta
I noticed that go-git is using a ton of CPU and memory when doing simple operations like iterating through a tree on large repositories.
Test program
package main
import (
"log"
"os"
"github.com/go-git/go-git/v5"
)
func main() {
log.SetFlags(log.LstdFlags | log.Lmicroseconds)
repo, err := git.PlainOpen(os.Args[1]); if err != nil { panic(err) }
headRef, err := repo.Head(); if err != nil { panic(err) }
commit, err := repo.CommitObject(headRef.Hash()); if err != nil { panic(err) }
tree, err := commit.Tree(); if err != nil { panic(err) }
log.Println("Iterating tree")
for _, entry := range tree.Entries {
size, err := tree.Size(entry.Name); if err != nil { panic(err) }
log.Println(entry.Name, size)
}
log.Println("Done")
}
Test environment
go version go1.23.6 linux/arm64
M1 MacBook Air (2020)
Expected behavior
The entire process completes in about 0.1s.
2025/03/08 20:54:07.990596 Iterating tree
2025/03/08 20:54:07.990964 .get_maintainer.ignore 194
2025/03/08 20:54:07.990993 .gitattributes 105
2025/03/08 20:54:07.991031 .gitignore 2180
2025/03/08 20:54:07.991156 .mailmap 42849
2025/03/08 20:54:07.991184 .rustfmt.toml 369
2025/03/08 20:54:07.991209 COPYING 496
...
2025/03/08 20:54:07.992446 Done
Actual behvior
It takes about 1.3s.
2025/03/08 20:52:52.634226 Iterating tree
...
2025/03/08 20:52:52.635310 .get_maintainer.ignore 194
2025/03/08 20:52:52.635343 .gitattributes 105
2025/03/08 20:52:53.781688 .gitignore 2180 <-- spends a lot of time here
2025/03/08 20:52:53.783122 .mailmap 42849
2025/03/08 20:52:53.783330 .rustfmt.toml 369
2025/03/08 20:52:53.783372 COPYING 496
...
2025/03/08 20:52:53.809643 Done
In my scenario it's .gitignore that's being slow to tree.Size on.
Profiling
It looks like the main problem is how genOffsetHash uses maps. I'm no expert when it comes to the Go runtime so I'll leave it here.
First bad commit
commit bb5c35196ee55b82f6506d69257de0e59cd8c41d
Date: Sun Nov 17 00:34:11 2024 +0000
plumbing: packfile, Refactor Parser and Scanner logic
The changes focus on increasing thread-safety, simplifying the code base,
enabling support for sha256 and improving general time and space complexities.
(snip: some statistics showing fewer allocs)
Call stack
Interestingly, genOffsetHash is only called occasionally, and it takes a long time whenever it is called.
github.com/go-git/go-git/v5/plumbing/format/idxfile.(*MemoryIndex).genOffsetHash(0x400016ed88)
/home/runxiyu/go-git/plumbing/format/idxfile/idxfile.go:230 +0x308
github.com/go-git/go-git/v5/plumbing/format/idxfile.(*MemoryIndex).FindHash(0x400016ed88, 0x4ec70ae3)
/home/runxiyu/go-git/plumbing/format/idxfile/idxfile.go:191 +0x1d0
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).getByOffset(0x40000a3180, 0x4ec70ae3)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:175 +0x34
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).getMemoryObject(0x40000a3180, 0x40001b4820)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:310 +0x224
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).objectFromHeader(0x40000a3180?, 0xf357138a8184676e?)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:281 +0x44
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).getByOffset(0x40000a3180, 0x73450f85)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:189 +0xb0
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).GetByOffset(0x40000a3180, 0x73450f85)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:81 +0xd0
github.com/go-git/go-git/v5/plumbing/format/packfile.(*Packfile).GetSizeByOffset(0x40000a3180, 0x73450f85)
/home/runxiyu/go-git/plumbing/format/packfile/packfile.go:91 +0x34
github.com/go-git/go-git/v5/storage/filesystem.(*ObjectStorage).encodedObjectSizeFromPackfile(0x4000160398, {0xa6, 0x1e, 0x47, 0x78, 0xd0, 0x11, 0xcf, 0x70, 0x6e, ...})
/home/runxiyu/go-git/storage/filesystem/object.go:326 +0x228
github.com/go-git/go-git/v5/storage/filesystem.(*ObjectStorage).EncodedObjectSize(0x4000160398, {0xa6, 0x1e, 0x47, 0x78, 0xd0, 0x11, 0xcf, 0x70, 0x6e, ...})
/home/runxiyu/go-git/storage/filesystem/object.go:340 +0xb0
github.com/go-git/go-git/v5/plumbing/object.(*Tree).Size(0x400015e410, {0x400011ce40?, 0x2?})
/home/runxiyu/go-git/plumbing/object/tree.go:100 +0x88
main.main()
/home/runxiyu/crap/test-go-git.go:25 +0x2d8
func (idx *MemoryIndex) genOffsetHash() error {
defer idx.mu.Unlock()
idx.mu.Lock()
count, err := idx.Count()
if err != nil {
return err
}
idx.offsetHash = make(map[int64]plumbing.Hash, count)
idx.offsetHashIsFull = true
var hash plumbing.Hash
i := uint32(0)
for firstLevel, fanoutValue := range idx.Fanout {
mappedFirstLevel := idx.FanoutMapping[firstLevel]
for secondLevel := uint32(0); i < fanoutValue; i++ {
fmt.Println("secondLevel", secondLevel) <-- [1]
copy(hash[:], idx.Names[mappedFirstLevel][secondLevel*objectIDLength:])
offset := int64(idx.getOffset(mappedFirstLevel, int(secondLevel)))
idx.offsetHash[offset] = hash
secondLevel++
}
}
return nil
}
The second-level loop at [1] seems to be called idx.Count() times, which is 10950545 in my case. Quite reasonably this slows things down.
genOffsetHash was only called for the file that slowed things down. It wasn't called at all before the problematic commit.
Now, I think setting a map 10950545 times sounds problematic... but this seems to have always been the behavior here. I'm looking into what caused genOffsetHash to be called.
func (idx *MemoryIndex) FindHash(o int64) (plumbing.Hash, error) {
var hash plumbing.Hash
var ok bool
fmt.Println(o) // I added these random debug prints because I can't use debuggers properly
fmt.Println(idx.offsetHash[o])
idx.mu.RLock()
if idx.offsetHash != nil {
if hash, ok = idx.offsetHash[o]; ok {
idx.mu.RUnlock()
return hash, nil
}
}
idx.mu.RUnlock()
// Lazily generate the reverse offset/hash map if required.
if !idx.offsetHashIsFull || idx.offsetHash == nil {
if err := idx.genOffsetHash(); err != nil {
return plumbing.ZeroHash, err
}
hash, ok = idx.offsetHash[o]
}
if !ok {
return plumbing.ZeroHash, plumbing.ErrObjectNotFound
}
return hash, nil
}
This has previously returned
1933905797
a61e4778d011cf706e6784818a1357f392f3a669
2025/03/08 21:38:56.182156 .gitignore 2180
but after the problematic commit it returns
1933905797
a61e4778d011cf706e6784818a1357f392f3a669
1933905797
a61e4778d011cf706e6784818a1357f392f3a669
1321667299
0000000000000000000000000000000000000000
Previously FindHash is only called one time per file, but now it's being called at least two times for each file, and for .gitignore it's called three times (with the last one returning a null hash).
Notes
It's on main but not in any releases yet
The 'assign to a map a million times' behaviour could be seen in commits as old as 83649a1c08e0dcc103de54652ed5b9c33d301326 when running genOffsetHash e.g. when fetching blob contents from packfiles
Additionally, memory usage also explodes
I believe I'm experiencing the same issue. In my case I'm cloning a repository to obtain a file at a known path on a given revision (could be any tag, branch, or sha). It seems the only way to do do this is using go-git is to clone the repository in full (if there is a way to do so without a full checkout that would be awesome!) For example I attempted to clone this 145M repository running the following code. The process swelled to over 25GB of memory over the course of >5 minutes before I had kill it .
package main
import (
"log"
"github.com/go-git/go-billy/v5/memfs"
"github.com/go-git/go-git/v5"
"github.com/go-git/go-git/v5/storage/memory"
)
func main() {
cloneOpts := &git.CloneOptions{
URL: "https://github.com/red-hat-data-services/RHOAI-Build-Config.git",
NoCheckout: true,
}
_, err := git.Clone(memory.NewStorage(), memfs.New(), cloneOpts)
if err != nil {
log.Fatalf("clone error: %w", err)
}
}
It's worth noting as well, though this may warrant opening a separate issue, that git.CloneContext with a timeout does not allow users to stop unbound time/memory during cloning. The bulk of the memory and time are spent in Repository.updateReferences which does not receive the context passed to its caller. Given that fetchAndUpdateReferences can be and expensive and time-consuming operation, it's unfortunate that supplying a Context to use during cloning will only have any effect if the context is cancelled during the network IO.
Not sure what you mean by a "full checkout"... it works with bare repos so that's definitely not required. Are you referring to shallow clones (which would produce a much smaller packfile)?
@runxiyu yes, sorry for the confusion. “Full Checkout” was intended to mean CloneOptions.Depth: 0. “Full Clone” would have been less confusing, but as you said my issue is worsened by being unable to shallow clone (in case the user’s revision is not at the head of any branch/tag)
@aThorp96 maybe a sparse-checkout would make things slightly better on your case, as you don't need to go through the entire worktree.
Previously FindHash is only called one time per file, but now it's being called at least two times for each file, and for .gitignore it's called three times (with the last one returning a null hash).
@runxiyu this does not seem to be the case with the code you provided. Are you taking about a different code snippet?
There is certainly room for improvement in the index handling, but in terms of processing time, taking ~1s to process the repository and loop the tree, considering this is a repository with over a million commits and over 80k files, does not sound too bad. Was it the same code snippet you used to time the execution before and after that commit or the 0.1s was a comparison against the git cli?
This is the behavior that I observe on an aarch64 linux machine. The 0.1s was right before the mentioned commit, and the 1s is the mentioned commit
@runxiyu I timed the specific ops in your code and tested with the latest released version (without the "bad commit") and against the latest v6. The difference in performance is neglegible:
| Operation | v5.14.0 | v6 (v6-transport) |
|---|---|---|
| PlainOpen | 64.15µs | 43.561µs |
| Head | 35.646µs | 29.424µs |
| CommitObject | 58.574624ms | 61.600751ms |
| Tree | 1.188076258s | 1.16559348s |
As far as I can see, the changes pointed out do not have an impact on the execution path.
That being said, you mentioned aarch64. One thing that would certainly have a worse performance on that architecture is the sha1cd implementation, as that is not as optimised outside amd64. Can you try calling hash.RegisterHash(crypto.SHA1, sha1.New) as the first thing in your code and check whether there is an impact on the execution? Please note that this removes collision detection and should not be used against servers you don't trust.
https://github.com/user-attachments/assets/67fc6936-4433-40fd-a527-22d7a801dd26
I haven't investigated sha1cd yet (though I probably don't need it, as I am a server myself, and I handle pushes through git-receive-pack instead of go-git) But here's a demo of how things seem to slow down between v5.4.0 and main
@runxiyu thanks for sharing the video. This aligns with what I mentioned before as sha1cd was introduced on v5.5.0. Can you confirm what version of the sha1cd dependency you have in your go.mod please?
The last benchmarks results I have on arm64 are a few years old. Do you mind running this just to see whether you getting an even worse performance than expected?
sha1cd benchmark:
go test -benchmem -run=^$ -bench ^Benchmark ./...
PASS
ok github.com/pjbgf/sha1cd 0.001s
? github.com/pjbgf/sha1cd/cgo [no test files]
? github.com/pjbgf/sha1cd/internal [no test files]
goos: linux
goarch: arm64
pkg: github.com/pjbgf/sha1cd/test
BenchmarkCalculateDvMask/generic-8 1000000000 0.0000004 ns/op 0 B/op 0 allocs/op
BenchmarkCalculateDvMask/cgo-8 1000000000 0.0000014 ns/op 0 B/op 0 allocs/op
BenchmarkHash8Bytes/sha1-8 18426642 58.86 ns/op 135.92 MB/s 0 B/op 0 allocs/op
BenchmarkHash8Bytes/sha1cd_generic-8 712749 1911 ns/op 4.19 MB/s 192 B/op 2 allocs/op
BenchmarkHash8Bytes/sha1cd_cgo-8 236654 4585 ns/op 1.74 MB/s 2688 B/op 1 allocs/op
BenchmarkHash320Bytes/sha1-8 6731809 189.4 ns/op 1689.87 MB/s 0 B/op 0 allocs/op
BenchmarkHash320Bytes/sha1cd_generic-8 453511 3388 ns/op 94.46 MB/s 192 B/op 2 allocs/op
BenchmarkHash320Bytes/sha1cd_cgo-8 145783 7714 ns/op 41.48 MB/s 2688 B/op 1 allocs/op
BenchmarkHash1K/sha1-8 2288988 507.4 ns/op 2018.08 MB/s 0 B/op 0 allocs/op
BenchmarkHash1K/sha1cd_generic-8 346070 5330 ns/op 192.11 MB/s 192 B/op 2 allocs/op
BenchmarkHash1K/sha1cd_cgo-8 100228 14346 ns/op 71.38 MB/s 2688 B/op 1 allocs/op
BenchmarkHash8K/sha1-8 339747 3707 ns/op 2209.64 MB/s 0 B/op 0 allocs/op
BenchmarkHash8K/sha1cd_generic-8 43082 29421 ns/op 278.44 MB/s 192 B/op 2 allocs/op
BenchmarkHash8K/sha1cd_cgo-8 76657 23501 ns/op 348.57 MB/s 2688 B/op 1 allocs/op
BenchmarkHashWithCollision/sha1cd_generic-8 297627 5786 ns/op 110.62 MB/s 192 B/op 2 allocs/op
BenchmarkHashWithCollision/sha1cd_cgo-8 91353 16493 ns/op 38.81 MB/s 2688 B/op 1 allocs/op
PASS
ok github.com/pjbgf/sha1cd/test 25.041s
? github.com/pjbgf/sha1cd/ubc [no test files]
https://github.com/user-attachments/assets/4d2ba4ef-4941-422b-9c0a-7f2ee61c198a
I think this pretty conclusively shows that this commit is the issue here
commit bb5c35196ee55b82f6506d69257de0e59cd8c41d
Author: Paulo Gomes <[email protected]>
Date: Sun Nov 17 00:34:11 2024 +0000
plumbing: packfile, Refactor Parser and Scanner logic
The changes focus on increasing thread-safety, simplifying the code base,
enabling support for sha256 and improving general time and space complexities.
As far as I can see, it's something happening in packfile/indexfile handling rather than sha1cd
@runxiyu can you please test #1523 again, I believe the latest changes should make an impact on the issue you are experiencing.
@pjbgf Still stuck for about a second on the .gitignore file in that Asahi Linux repo, as of
commit f9f2ae49b16909af2adee3c26e03dc54d8e5fe10
tree 183fe4feacf2c9aebc8e4884e67544cb389e14f2
parent 0b2ab6d34260bc70733a67460da5f3bebbea9488
author Paulo Gomes <[email protected]> 1748507032 +0100
committer Paulo Gomes <[email protected]> 1748507032 +0100
gpgsig -----BEGIN SSH SIGNATURE-----
U1NIU0lHAAAAAQAAAEoAAAAac2stc3NoLWVkMjU1MTlAb3BlbnNzaC5jb20AAAAgSDBGoe
bYEec9307bC2ANGbdfPx+1JwqiGybGfpGHa7IAAAAEc3NoOgAAAANnaXQAAAAAAAAABnNo
YTUxMgAAAGcAAAAac2stc3NoLWVkMjU1MTlAb3BlbnNzaC5jb20AAABAtJgwBjRlCxoYPF
ASIYKxt1SeuIAdZbDac6vSRbSX5nOu1hFakGuHIkhcssKejqrgK/9iWwOn+9xoQsnJmv+q
BgEAAKx9
-----END SSH SIGNATURE-----
plumbing: transport, Use KeepDescriptors by default
In filesystem storage, the packfile caching is only used if KeepDescriptors
is enabled. This should make overall operations using filesystem storage more
efficient.
Signed-off-by: Paulo Gomes <[email protected]>