pygbif
pygbif copied to clipboard
Lack of HTTP connection reuse makes repeated queries slow
PyGBIF makse makes a new HTTP connection for every request, which has a lot of overhead. It would be much quicker to use the requests module with sessions, so the connection can be reused.
(@mdoering's words via email, CC @juliancabezas)
Looking at the python client code I see it does not reuse connections at all, thus it has to do all the TCP/SSL overhead each time! https://stackoverflow.com/questions/24873927/python-requests-module-and-connection-reuse
If I use the requests module manually with http sessions it becomes a lot quicker:
import logging, requests, timeit
timeit.timeit('_ = requests.get("https://api.gbif.org/v1/species/match?name=Poa%20annua")', 'import requests', number=100)
12.797771417999911
timeit.timeit('_ = session.get("https://api.gbif.org/v1/species/match?name=Poa%20annua")', 'import requests; session = requests.Session()', number=100)
3.185472910000044
It would need a bigger change to the pygbif codebase to support that.