repo.walk in thread hangs the main thread
When calling repo.walk in a new thread, it blocks the main thread until it finished!!!
See the example code below, when run with a huge repo (qt5, chromium e.g.), the main thread won't print any message until the repo.walk ended. (Use iter also the same)
It seems that repo.diff also have the problem.
from pygit2 import Repository, GIT_SORT_TOPOLOGICAL
from threading import Thread
import sys
import time
def thread_func(repo_dir):
repo = Repository(repo_dir)
print(">>>>>>>> begin diff")
commits = list(repo.walk(repo.head.target, GIT_SORT_TOPOLOGICAL))
#for commit in repo.walk(repo.head.target, GIT_SORT_TOPOLOGICAL):
# continue
print(">>>>>>>> end diff")
def test(repo_dir):
t = Thread(target=thread_func, args=[repo_dir])
t.start()
while t.is_alive():
print("main thread...")
time.sleep(0.01)
if __name__ == "__main__":
if len(sys.argv) != 2:
print(">>>>>>>> Invalid argument")
sys.exit(-1)
test(sys.argv[1])
That's not the behaviour I observe. If you replace list(...) by the for loop you will see many prints. In other words, it's list which is blocking, not pygit2. And that's expected in my opinion, read about the Python's GIL (Global Interpreter Lock): list is a single call, so the GIL won't allow any other thread to run.
You can either write the code differently, using a for loop, or go multiprocessing.
As I mentioned, the for loop is the same here. My project also uses for loop, but it just hangs the GUI thread. On windows platform it even worse compare to Linux.
I will try to use multiprocessing to see if it have nice performance to walk on small repo.