libgit2 icon indicating copy to clipboard operation
libgit2 copied to clipboard

Shallow support v2

Open pks-t opened this issue 6 years ago β€’ 29 comments

This kind-of succeeds the pull request #4331 from @tiennou, as he indicated that he's busy with the libssh backend for now. I've done quite a lot of changes to the interface itself:

  • Grafts are now a "real" standalone structure git_grafts to help refactoring and extension in the future, should the need ever arise.
  • Grafts can now be directly tied to a file so that one can just call git_grafts_refresh. This will also make it easier in the future to implement functions like git_grafts_write, for example when creating shallow clones or if users need to amend the grafts files on their own.
  • The grafts code is a lot more encapsulated in "grafts.c". The structure itself is now completely opaque to its users, so that all logic with respects to graft commit storage and parsing of files is contained in "grafts.c".

@tiennou pointed out already that grafts are in fact kind of deprecated, so some of the listed benefits may never apply. But I still think that the resulting code is a lot easier to read by totally encapsulating it into its own code module.

pks-t avatar Oct 03 '19 11:10 pks-t

Thanks a lot for your review! Ain't got time to do another round of updates just now, but will do so later

pks-t avatar Oct 03 '19 12:10 pks-t

Addressed all comments by @tiennou and fixed a memory leak introduced with the refresh code

pks-t avatar Oct 03 '19 15:10 pks-t

So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had πŸ˜‰.

It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have it… For the rest, this is still a very WIP rebase β€” since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help πŸ˜‰. You'll see parts of should-now-be-updated-#5105 as well.

tiennou avatar Oct 03 '19 15:10 tiennou

So, just for kicks, I've rebased the rest of the shallow code on top of that, and pushed it here, and it broke the only testcase I had .

Bummer. I'll have a look at it. Is it because of the parsing simplification or of yet unknown cause?

It seems the syntax between grafts and shallow files is different, so the parsing can't be as shared as you have it… For the rest, this is still a very WIP rebase β€” since your grafting API is so much better than mine, I can use it as part of the external API more easily, and now there are quite a few git_oidarray/git_array_oid_t/git_grafts mismatches/cruft. Hopefully having some was-known-good code will help . You'll see parts of should-now-be-updated-#5105 as well.

Yeah. Honestly, I don't really like this git_oidarray and git_array_oid_t stuff as I think it's weird to use. I thought about just going all in and exposing the git_grafts object directly. So we expose a git_repository_grafts and git_repository_shallow_grafts function that allows one to retrieve the respective grafts, and then we simply expose parts of its API. What do you think about that?

pks-t avatar Oct 03 '19 15:10 pks-t

With regards to the format. It should in fact be the same format as "info/grafts", except for that every such line may not have any parents. Stated differently, each line shall hold a single commit ID and nothing else. And that's a perfectly valid grafts file, right?

Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?

pks-t avatar Oct 03 '19 15:10 pks-t

Ah. One thing I noticed in your additions: you have a git_grafts_geti function, but indeed the grafting commits are not guaranteed to be in any particular order. Is the ordering of them important to anything?

Edit: yup, that's very likely to be the issue here. You loop around geti, but this will depend on the oidmap behaviour. Some indices in between may not be populated depending on the map's allocated size, so you won't loop through all OIDs.

pks-t avatar Oct 03 '19 15:10 pks-t

Is the ordering of them important to anything?

AFAIR, it does not matter β€” the server will skip whatever we tell it as part of its revwalk. It's only so the server is aware of where to start the shallow pack.

tiennou avatar Oct 03 '19 17:10 tiennou

Could very well be though that I screwed up the parsing part while refactoring to use our own parser. Do you have a simple test grafts file that the new code fails to parse correctly?

Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors. The format outputted by the write code as part of the test should work β€” I ran Git against the shallow clone while debugging, but it's not part of clar.

(git_oidarray/git_array_oid_t/git_grafts mismatches/cruft discussion). What do you think about that?

I'm fine with how your API looks, and looks sufficiently generic as to be used for the negotiation step. It's just about which part is made public, since any transport implementation will need to manipulate them (ie. it floats around "git2/sys/transport.h", again).

tiennou avatar Oct 03 '19 17:10 tiennou

Yes, the code expects a separator after the OID, and thus for the shallow syntax the parser errors.

I honestly don't get it, sorry. For the shallow clones I performed, git(1) created a ".git/shallow" file that simply contained a single OID followed by newline. Nothing inbetween, no separator. I also tried to run your feature branch, but it crashed in seemingly unrelated locations (for instance dereferencing opts->depth when opts is NULL, and one crash in submodules code).

Anyway -- I've written another test suite grafts::parse that ties down the grafts parsing logic, and as far as I can see it currently behaves exactly like I'd expect it to. Please feel free to create another test case for it that shows what I'm doing wrong.

pks-t avatar Oct 10 '19 10:10 pks-t

I honestly don't get it, sorry.

That's strange, I was getting one failure, on the mkfile-based testcase, because of an expected space after the graft OID. I have no idea if an empty graft line works (a.k.a is it a syntax error, or synonymous with a "unparented" commit). IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.

I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions πŸ˜‰.

tiennou avatar Oct 10 '19 17:10 tiennou

IIRC git hard-errors on a broken .git/info/grafts, hence the separate parser I had.

It doesn't, it just complains loudly and visibly on every git command that gets executed. I've tested for several different cases how we behave and how git behaves. Stuff like creating cyclic commit graphs by grafting A to B and B to A and such stuff. I was surprised that we actually handled this just fine and behaved exactly like git does.

The only exception I found was grafting a commit to a tree, where we just bailed out but git just displayed the commit, but without any parents. But I guess it's perfectly fine for us to say "Nope, that's wrong, don't do it."

I'll definitely investigate though. AFAIK the only major think missing is "commit the shallow file" once we're sure all objects have been unpacked, a.k.a locking and transactions .

Cool. Just let me know when you think this PR may progress so that I can tackle the next set of features towards shallow support.

pks-t avatar Oct 11 '19 05:10 pks-t

I've rebased this branch on the latest master. I've also decided to bail out of the public API and thus removed the function git_repository_shallow_roots for now. I for myself am much too uncertain yet how to design the public API and don't want to paint us into a corner just yet. I think we should get a bit more comfortable with things like both lifecycle and when to refresh and only then start exposing some certain sets of functionality, ideally based on definitive usecases.

pks-t avatar Oct 24 '19 12:10 pks-t

I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anyways…

Also, I feel like there should be a way to know you're not really done revwalking β€” ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the last…

tiennou avatar Nov 05 '19 17:11 tiennou

On Tue, Nov 05, 2019 at 09:19:35AM -0800, Etienne Samson wrote:

I share the concerns, and I agree we don't want to push more design work here until the protocol part is more fleshed out (as you say, refresh & lifecycle issues). It's just that we do allow custom transports, and those transports that want to handle shallow fetches will want access as well, and thus it's doomed to become public anyways…

Also, I feel like there should be a way to know you're not really done revwalking β€” ideally, a GUI client might want to inform the user that there are more commits available, to propose a shallow fetch of more of them. Right now you can't say, apart from a blanket "this repo is shallow", and you can't know if the last OID you've revwalked really was the last…

Yeah, definitely. I'd just like to postpone it for a bit, as I hope that we'll get some more experience with this new stuff e.g. when the shallow protocol support lands. This PR won't make it in for v1.0 anyway, so I'd say we should get it into the next branch as soon as we've started stabilizing for v1.0 to let it cook, then implement the shallow protocol on top of that and finally come up with an external interface before v1.1 is released.

pks-t avatar Nov 06 '19 06:11 pks-t

Is this still planned for 1.1? :)

Nivl avatar Apr 27 '20 22:04 Nivl

Is this still planned for 1.1? :)

I'd very much like to aim for it. Let me re-roll and fix conflicts to get it going again.

pks-t avatar May 12 '20 09:05 pks-t

/rebuild

pks-t avatar May 12 '20 14:05 pks-t

Sorry @pks-t, an error occurred while trying to requeue the build.

Test failure is caused by #5515. Otherwise everything's green

pks-t avatar May 12 '20 14:05 pks-t

Note/Reminder: #5515 is fixed πŸ˜‰

litetex avatar May 31 '20 19:05 litetex

Thanks for the reminder, @litetex. I've rebased the MR again so CI should be green now.

pks-t avatar Jun 01 '20 10:06 pks-t

Is there any reason, why this is still not merged? The PR is approved and the checks did all pass...

litetex avatar Jun 06 '20 12:06 litetex

@litetex Given that this PR touchse quite central parts of libgit2, I'd definitely like @ethomson to have a look before this gets merged. But as always, there's only so much time we all have available and reviewing biggish PRs as this one that introduce quite some new logic is really time consuming

pks-t avatar Jun 08 '20 13:06 pks-t

Rebased again to spur interest and check whether it still passes CI.

pks-t avatar Jun 27 '20 12:06 pks-t

Rebased again to spur interest and check whether it still passes CI.

It does! 😁

Sorry, I'm slammed right now, I'll try to take an πŸ‘€ early next week.

ethomson avatar Jun 28 '20 10:06 ethomson

@ethomson No worries, take your time!

pks-t avatar Jun 28 '20 13:06 pks-t

@pks-t @ethomson thank you so much for all the awesome work! πŸ˜ŠπŸ˜ŠπŸ‘

Is there anything that can be assisted with in regards to this PR?

Libgit2/git2go is used in the Flux project (https://github.com/fluxcd/flux2) and used to clone repositories regularly into Kubernetes clusters to provide a GitOps workflow. Libgit2 and go-git are the two libraries supported but Libgit2 is the one that supports git wire protocol v2 which is needed for some Git providers and will most likely be more and more important in the future.

The challenge right now is that go-git supports shallow cloning which takes a lot less space, time and bandwidth and Libgit2 doesn't which means those using git wire protocol v2 will have a a slower experience and taking up more space as well as bandwidth.

With this PR, when it's available with git2go, Libgit2 can be used by everyone.

I hope you have a great weekend! πŸš€πŸŽ‰

simongottschlag avatar Apr 16 '21 21:04 simongottschlag

Any progress? This would be really useful for me.

jwpjrdev avatar Jan 19 '22 19:01 jwpjrdev

Kindly bump the question about any progress on this :)

birunts avatar Jun 06 '22 09:06 birunts