pygbif icon indicating copy to clipboard operation
pygbif copied to clipboard

Lack of HTTP connection reuse makes repeated queries slow

Open MattBlissett opened this issue 3 years ago • 0 comments

PyGBIF makse makes a new HTTP connection for every request, which has a lot of overhead. It would be much quicker to use the requests module with sessions, so the connection can be reused.

(@mdoering's words via email, CC @juliancabezas)

Looking at the python client code I see it does not reuse connections at all, thus it has to do all the TCP/SSL overhead each time! https://stackoverflow.com/questions/24873927/python-requests-module-and-connection-reuse

If I use the requests module manually with http sessions it becomes a lot quicker:

import logging, requests, timeit
timeit.timeit('_ = requests.get("https://api.gbif.org/v1/species/match?name=Poa%20annua")', 'import requests', number=100)
12.797771417999911
timeit.timeit('_ = session.get("https://api.gbif.org/v1/species/match?name=Poa%20annua")', 'import requests; session = requests.Session()', number=100)
3.185472910000044

It would need a bigger change to the pygbif codebase to support that.

MattBlissett avatar Sep 09 '22 11:09 MattBlissett