Shallow support v2
This kind-of succeeds the pull request #4331 from @tiennou, as he indicated that he's busy with the libssh backend for now. I've done quite a lot of changes to the interface itself:
- Grafts are now a "real" standalone structure
git_graftsto help refactoring and extension in the future, should the need ever arise. - Grafts can now be directly tied to a file so that one can just call
git_grafts_refresh. This will also make it easier in the future to implement functions likegit_grafts_write, for example when creating shallow clones or if users need to amend the grafts files on their own. - The grafts code is a lot more encapsulated in "grafts.c". The structure itself is now completely opaque to its users, so that all logic with respects to graft commit storage and parsing of files is contained in "grafts.c".
@tiennou pointed out already that grafts are in fact kind of deprecated, so some of the listed benefits may never apply. But I still think that the resulting code is a lot easier to read by totally encapsulating it into its own code module.
Thanks a lot for your review! Ain't got time to do another round of updates just now, but will do so later
Addressed all comments by @tiennou and fixed a memory leak introduced with the refresh code
So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had π.
It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have itβ¦ For the rest, this is still a very WIP rebase β since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help π. You'll see parts of should-now-be-updated-#5105 as well.
So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had .
Bummer. I'll have a look at it. Is it because of the parsing simplification or of yet unknown cause?
It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have itβ¦ For the rest, this is still a very WIP rebase β since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help . You'll see parts of should-now-be-updated-#5105 as well.
Yeah. Honestly, I don't really like this git_oidarray and git_array_oid_t stuff as I think it's weird to use. I thought about just going all in and exposing the git_grafts object directly. So we expose a git_repository_grafts and git_repository_shallow_grafts function that allows one to retrieve the respective grafts, and then we simply expose parts of its API. What do you think about that?
With regards to the format. It should in fact be the same format as "info/grafts", except for that every such line may not have any parents. Stated differently, each line shall hold a single commit ID and nothing else. And that's a perfectly valid grafts file, right?
Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?
Ah. One thing I noticed in your additions: you have a git_grafts_geti function, but indeed the grafting commits are not guaranteed to be in any particular order. Is the ordering of them important to anything?
Edit: yup, that's very likely to be the issue here. You loop around geti, but this will depend on the oidmap behaviour. Some indices in between may not be populated depending on the map's allocated size, so you won't loop through all OIDs.
Is the ordering of them important to anything?
AFAIR, it does not matter β the server will skip whatever we tell it as part of its revwalk. It's only so the server is aware of where to start the shallow pack.
Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?
Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors. The format outputted by the write code as part of the test should work β I ran Git against the shallow clone while debugging, but it's not part of clar.
(git_oidarray/git_array_oid_t/git_grafts mismatches/cruft discussion). What do you think about that?
I'm fine with how your API looks, and looks sufficiently generic as to be used for the negotiation step. It's just about which part is made public, since any transport implementation will need to manipulate them (ie. it floats around "git2/sys/transport.h", again).
Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors.
I honestly don't get it, sorry. For the shallow clones I performed, git(1) created a ".git/shallow" file that simply contained a single OID followed by newline. Nothing inbetween, no separator. I also tried to run your feature branch, but it crashed in seemingly unrelated locations (for instance dereferencing opts->depth when opts is NULL, and one crash in submodules code).
Anyway -- I've written another test suite grafts::parse that ties down the grafts parsing logic, and as far as I can see it currently behaves exactly like I'd expect it to. Please feel free to create another test case for it that shows what I'm doing wrong.
I honestly don't get it, sorry.
That's strange, I was getting one failure, on the mkfile-based testcase, because of an expected space after the graft OID. I have no idea if an empty graft line works (a.k.a is it a syntax error, or synonymous with a "unparented" commit). IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.
I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions π.
IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.
It doesn't, it just complains loudly and visibly on every git command that gets executed. I've tested for several different cases how we behave and how git behaves. Stuff like creating cyclic commit graphs by grafting A to B and B to A and such stuff. I was surprised that we actually handled this just fine and behaved exactly like git does.
The only exception I found was grafting a commit to a tree, where we just bailed out but git just displayed the commit, but without any parents. But I guess it's perfectly fine for us to say "Nope, that's wrong, don't do it."
I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions .
Cool. Just let me know when you think this PR may progress so that I can tackle the next set of features towards shallow support.
I've rebased this branch on the latest master. I've also decided to bail out of the public API and thus removed the function git_repository_shallow_roots for now. I for myself am much too uncertain yet how to design the public API and don't want to paint us into a corner just yet. I think we should get a bit more comfortable with things like both lifecycle and when to refresh and only then start exposing some certain sets of functionality, ideally based on definitive usecases.
I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anywaysβ¦
Also, I feel like there should be a way to know you're not really done revwalking β ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the lastβ¦
On Tue, Nov 05, 2019 at 09:19:35AM -0800, Etienne Samson wrote:
I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anywaysβ¦
Also, I feel like there should be a way to know you're not really done revwalking β ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the lastβ¦
Yeah, definitely. I'd just like to postpone it for a bit, as I hope that we'll get some more experience with this new stuff e.g. when the shallow protocol support lands. This PR won't make it in for v1.0 anyway, so I'd say we should get it into the next branch as soon as we've started stabilizing for v1.0 to let it cook, then implement the shallow protocol on top of that and finally come up with an external interface before v1.1 is released.
Is this still planned for 1.1? :)
Is this still planned for 1.1? :)
I'd very much like to aim for it. Let me re-roll and fix conflicts to get it going again.
/rebuild
Sorry @pks-t, an error occurred while trying to requeue the build.
Test failure is caused by #5515. Otherwise everything's green
Note/Reminder: #5515 is fixed π
Thanks for the reminder, @litetex. I've rebased the MR again so CI should be green now.
Is there any reason, why this is still not merged? The PR is approved and the checks did all pass...
@litetex Given that this PR touchse quite central parts of libgit2, I'd definitely like @ethomson to have a look before this gets merged. But as always, there's only so much time we all have available and reviewing biggish PRs as this one that introduce quite some new logic is really time consuming
Rebased again to spur interest and check whether it still passes CI.
Rebased again to spur interest and check whether it still passes CI.
It does! π
Sorry, I'm slammed right now, I'll try to take an π early next week.
@ethomson No worries, take your time!
@pks-t @ethomson thank you so much for all the awesome work! πππ
Is there anything that can be assisted with in regards to this PR?
Libgit2/git2go is used in the Flux project (https://github.com/fluxcd/flux2) and used to clone repositories regularly into Kubernetes clusters to provide a GitOps workflow. Libgit2 and go-git are the two libraries supported but Libgit2 is the one that supports git wire protocol v2 which is needed for some Git providers and will most likely be more and more important in the future.
The challenge right now is that go-git supports shallow cloning which takes a lot less space, time and bandwidth and Libgit2 doesn't which means those using git wire protocol v2 will have a a slower experience and taking up more space as well as bandwidth.
With this PR, when it's available with git2go, Libgit2 can be used by everyone.
I hope you have a great weekend! ππ
Any progress? This would be really useful for me.
Kindly bump the question about any progress on this :)