git lfs fetch a range of commits
Describe the issue I'm trying to copy a single branch between two remotes - pulling from one and pushing to the other - and this means I also need to pull the LFS files associated with the branch. Without them I get errors when I try to push, as you'd expect.
I cannot use --all because there are terabytes of LFS files in the source repo associated with other branches that I'm not interested in.
I'd hoped git lfs fetch might support a revision range, so I could do git lfs fetch origin <branch_start>..<branch_tip> but this doesn't seem to work.
The error messages I get when I push tell me which OIDs I'm missing, but I cannot see any easy way to figure out which commit they are being referenced by, so I can't easily turn them into revs to pass to git lfs fetch.
I've tried a simple for rev in $(git rev-list <branch_start> <branch_tip>); do git lfs fetch $rev; done and it seems to work but it's horrendously slow, I assume because fetch is scanning the entirety of the tree on each revision and is not able to exploit the fact that most tree hashes do not change between revisions.
It looks like my best path forward might be to write my own tool that
- Uses
git rev-listto get all revisions in my branch - Walks the trees for each revision using
git check-attrto collect all LFS pointers at each rev and parse to extract the OID - Filter out OIDs which are already in my LFS storage
- Pass the remaining revisions that have 1+ missing OID to
git lfs fetch
which of course I can do, but it feels like I'm re-implementing a nontrivial chunk of what LFS fetch actually does. So I'm wondering if I missed something - is there an easier way to do this?
System environment Problem exists on both Windows and macOS.
Output of git lfs env
git-lfs/3.4.0 (GitHub; windows amd64; go 1.20.6; git d06d6e9e)
git version 2.43.0.windows.1
Hey,
I'm not sure of a good way to go about doing this in an efficient way. You probably want to use git rev-list A..B --not --remotes=dest (where dest is the destination remote), which will make this a lot more efficient and avoid traversing all objects that are already on the destination, but it's still not going to be screamingly performant.
We internally use git cat-file --batch to make it more efficient to find the objects without spawning a large number of Git processes, which you can do, too. You can also use git cat-file --batch-check first to find those items which are pointer files (which must be less than 1024 bytes), since sometimes people mark a file as an LFS file and then push the large object anyway. However, this will likely require more work than a simple shell one-liner, so you might want to write something like a Ruby script to handle this.
I think what you want here for scripting is an equivalent to git lfs push's --object-id flag, which unfortunately doesn't exist yet. It shouldn't be too hard to add if you're interested, but it's ultimately going to be rather difficult to handle as part of scripting without adding that functionality.
Yeah, I ended up solving this using a Python script which did something along the lines of my original post, except that rather than passing revisions to git lfs fetch, it was easier to directly pipe LFS pointer file content to git lfs smudge (discarding the output, but taking advantage of the fact that it caused the object to be downloaded).
And yes, supporting --object-id for git lfs fetch would be nice and would avoid needing to do the smudge shenanigans. Ideally LFS would expose facilities for synchronizing remote/local LFS object stores without needing to touch commits - push --object-id is half of that story, but we're missing a corresponding fetch piece.