GitPython Directly fetching refs returns the wrong symbolic references

When directly fetching a ref without specifying a destination ref. E.g. repo.remotes.origin.fetch(["master"]) instead of repo.remotes.origin.fetch(["refs/heads/master:refs/remotes/origin/master"]) git will produce a different amount of ref-update lines on stderr than in .git/FETCH_HEAD. For this example it would produce both these lines:

 * branch              master     -> FETCH_HEAD
 * [new branch]        master     -> origin/master

.git/FETCH_HEAD however will (kind of obviously) not contain anything to match the -> FETCH_HEAD lines (there can be multiple, more on that later).

037d62a9966743cf7130193fa08d5182df251b27                branch 'master' of https://github.com/gitpython-developers/GitPython

This mismatch causes the truncation code from 1537aabfa3bb32199e321766793c87864f36ee9a to kick in. Unfortunately, because the mismatched output (extra -> FETCH_HEAD lines that don't appear in the FETCH_HEAD file itself) appear in the front this truncation code leaves the stderr lines and .git/FETCH_HEAD lines mismatched. In this particular use case that's easily remedied by truncating the front of the list instead of the back:

diff --git a/git/remote.py b/git/remote.py
index 65916614..72c15ddc 100644
--- a/git/remote.py
+++ b/git/remote.py
@@ -698,9 +698,9 @@ class Remote(LazyMixin, Iterable):
             log.debug("info lines: " + str(fetch_info_lines))
             log.debug("head info : " + str(fetch_head_info))
             if l_fil < l_fhi:
-                fetch_head_info = fetch_head_info[:l_fil]
+                fetch_head_info = fetch_head_info[l_fhi - l_fil :]
             else:
-                fetch_info_lines = fetch_info_lines[:l_fhi]
+                fetch_info_lines = fetch_info_lines[l_fil - l_fhi :]
             # end truncate correct list
         # end sanity check + sanitization

But I'm unsure about submitting this patch as a PR because it might end up causing problems in case that I'm unaware of, where truncating from the end is the correct thing to do (or maybe fetch_head_info should be truncated at the back and only fetch_info_lines at the front...).

Additionally the -> FETCH_HEAD lines seem to appear on stderr once for every refspec without a dst segment (given the refspec syntax [+]<src>[:<dst>] from git help fetch). But, when fetching the same refspec without dst twice, the corresponding dd3cdfc9..037d62a9 master -> origin/master line that signals the update of the local remote ref (.git/refs/remotes/*) appears only once on stderr while it does generate two lines in .git/FETCH_HEAD.

As a result of this I'm convinced that the attempt to try to match the stderr output to the produced .git/FETCH_HEAD is unlikely to be made to work in every scenario...

FYI: for me it's more important that the return value of remote.fetch has each of its .ref members correspond one-to-one with FETCH_HEAD. The above patch would accomplish that iff len(fetch_head_info) <= len(fetch_info_lines). So it's at least an improvement for my needs as I would never get the wrong info/mismatched info. But I might still miss info if there are less lines discovered on stderr than in .git/FETCH_HEAD. So my preference would be to still keep processing .git/FETCH_HEAD lines, even when we've ran out of stderr lines.

So in short: my naive expectation of remote.fetch is that it returns one FetchInfo entry per ref entry in .git/FETCH_HEAD. All the extra metadata about those ref entries I see as nice to have and it would be acceptable for me to not have access to that under some circumstances.

Jan 14 '21 14:01 muggenhor

Thanks so much for taking the time to investigate the implementation! I couldn't agree more that there isn't always a perfect solution to obtain information on which refs got updated, so the only thing one could hope here is to make it a little less imperfect.

Even though I would definitely love to see this improvement in mainline, I also agree with the sentiment that it might break in unpredictable ways.

Maybe there is a middle ground - would it be possible to factor the ref handling into its own implementation and allow others to inject their own? Or to make it a little less complex, provide another method to fetch/pull but with a different algorithm to allow testing in the wild?

That way the existing algorithm could remain stable for now without being in the way of progress.

What do you think?

Jan 17 '21 02:01 Byron

Sorry for the late reply, I got side tracked.

I like your idea for making it possible to replace the ref-handling implementation. That should allow some experimentation with this to incrementally get a better implementation instead of having to get it right immediately.

I think that it's better to parametrize the ref-handling implementation than to add an extra method.

Jan 22 '21 13:01 muggenhor

No sorry ever necessary :).

Great to hear! Yes, going ahead with parameterizing the method should do just fine. Once the new way of doing the work is proven to work, it could even become the default one day.

Jan 23 '21 02:01 Byron