GitPython icon indicating copy to clipboard operation
GitPython copied to clipboard

.fetch(..., prune=True) might result in BadName: Ref '...' did not resolve to an object

Open yarikoptic opened this issue 7 years ago • 8 comments

Original issue/record in DataLad: https://github.com/datalad/datalad/issues/2550 We have been running into this occasional crash across a variety of our unittests, never had time to look in detail, especially since it is very hard to reproduce -- happens only rarely.

The issue is like this traceback (see more in the above datalad issue)

ERROR: datalad.distribution.tests.test_update.test_reobtain_data
...
  File "/build/datalad-0.10.0/.pybuild/pythonX.Y_2.7/build/datalad/distribution/update.py", line 169, in __call__
    prune=True)  # prune to not accumulate a mess over time
  File "/build/datalad-0.10.0/.pybuild/pythonX.Y_2.7/build/datalad/support/gitrepo.py", line 1715, in fetch
    fi_list += rm.fetch(refspec=refspec, progress=progress, **kwargs)
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 789, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 706, in _get_fetch_info_from_stderr
    for err_line, fetch_line in zip(fetch_info_lines, fetch_head_info))
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 706, in <genexpr>
    for err_line, fetch_line in zip(fetch_info_lines, fetch_head_info))
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 325, in _from_line
    old_commit = repo.rev_parse(operation.split(split_token)[0])
  File "/usr/lib/python2.7/dist-packages/git/repo/fun.py", line 334, in rev_parse
    obj = name_to_object(repo, rev)
  File "/usr/lib/python2.7/dist-packages/git/repo/fun.py", line 147, in name_to_object
    raise BadName(name)
BadName: Ref '2218af9' did not resolve to an object

and I am blaming us passing prune=True to GitPython's .fetch which is then passed without any internal handling to git fetch --prune call, and I am hypothesizing then git deciding to do 'gc', and somehow GitPython's internal gitdb getting out of sync with it and thus BadName. Is that a viable hypothesis? ;-) Anything what could be done on GitPython's end, or may be we could pass -c gc.auto=0 to that git fetch call to disable auto gc?

thanks in advance for guidance

yarikoptic avatar Jun 09 '18 16:06 yarikoptic

Your hypothesis sounds very reasonable. GitPythons state handling is not suitable for a lot of changes on disk. When implementing grit I will try to keep that in check, and at least make clear that some types may be stateful for performance reasons.

Byron avatar Jun 10 '18 18:06 Byron

Is there a way to pass -c options to this git fetch call? (Sorry for the rtfm type question)

yarikoptic avatar Jun 10 '18 20:06 yarikoptic

any wisdom shared on this matter would still be appreciated -- this bug haunts us at night (and during a day)

yarikoptic avatar Jul 12 '18 17:07 yarikoptic

@yarikoptic It looks like repo.remotes.origin.fetch(...) does indeed take additional arguments to be passed to the git program doing all the work. Maybe that helps.

Byron avatar Jul 15 '18 13:07 Byron

With the following change

--- a/datalad/distribution/update.py
+++ b/datalad/distribution/update.py
@@ -166,6 +166,7 @@ class Update(Interface):
             repo.fetch(
                 remote=None if fetch_all else sibling_,
                 all_=fetch_all,
+                c='gc.auto=0',
                 prune=True)  # prune to not accumulate a mess over time

getting error while running our tests:

  File "/home/yoh/proj/datalad/datalad/datalad/distribution/update.py", line 170, in __call__
    prune=True)  # prune to not accumulate a mess over time
  File "/home/yoh/proj/datalad/datalad/datalad/support/gitrepo.py", line 1715, in fetch
    fi_list += rm.fetch(refspec=refspec, progress=progress, **kwargs)
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 789, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/lib/python2.7/dist-packages/git/remote.py", line 675, in _get_fetch_info_from_stderr
    proc.wait(stderr=stderr_text)
  File "/usr/lib/python2.7/dist-packages/git/cmd.py", line 418, in wait
    raise GitCommandError(self.args, status, errstr)
GitCommandError: Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(129)
  cmdline: /usr/lib/git-annex.linux/git -c receive.autogc=0 -c gc.auto=0 fetch -c gc.auto=0 --prune -v origin
  stderr: 'error: unknown switch `c'

Which tells:

  1. Thanks @Byron, but that would pass it to the git fetch not to git
  2. my original guess was incorrect since we seems to pass -c gc.auto=0 already somehow to the underlying git, so it is not it :-( grr, what else could it be?

yarikoptic avatar Jul 24 '18 13:07 yarikoptic

Since I do not think we could do anything else on our (datalad) end to workaround it, @Byron, do you think it would be feasible to e.g. at this level:

File "/usr/lib/python2.7/dist-packages/git/repo/fun.py", line 334, in rev_parse

to guard for that BadName exception and if it happens, to reload DB?

yarikoptic avatar Jul 24 '18 14:07 yarikoptic

For now worked around in https://github.com/datalad/datalad/pull/2712/files#diff-c13d3ecf2ccb909497bd070a2dd379baL166 by catching this exception, closing/flushing all we typically do for a commit, and then trying to fetch again. Let's see if it shows its ugly face again ;-)

yarikoptic avatar Jul 26 '18 14:07 yarikoptic

@yarikoptic Actually I don't know what would be best here, and I would trust your judgement. For now it seems you are good on your end. In case a fix in GitPython is in order, please feel free to submit a PR.

Byron avatar Aug 05 '18 12:08 Byron