Incosistency when searching for publications
Describe the Bug
I encountered an inconsistency when searching for publications by title. The search behavior is unpredictable: sometimes it returns results, and other times it doesn't, depending on how the session is restarted.
To Reproduce
Steps to reproduce the behavior:
- Attempt to search for a specific publication by title using iPython.
- Observe that in some cases, results are returned, while in others (after restarting the session), no results are found.
Expected Behavior
The expected behavior is to receive consistent search results every time the query is run, regardless of whether the session is restarted.
Desktop:
- Proxy Service: FreeProxies
- Python Version: 3.11
- Operating System: macOS
- Library Version: 1.5
Possible Fix
The issue might be in the _load_url function located in publication_parser.py. I suggest changing the following line:
self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl') + self._soup.find_all('div', class_='gsc_mpat_ttl')
to:
self._rows = self._soup.select("div.gs_r.gs_or.gs_scl") + self._soup.select("div.gsc_mpat_ttl")
This should potentially improve the consistency of search results.
Great work! This line change works for me!~ Thanks!!!
Thank you @NisoD for the issue and the PR, and thank you @stellarkey for testing it out and reporting here.
I learnt something from this and I think we'd want to use select instead of find_all everywhere we have multiple attributes. I'll accept the PR but leave this issue open as a reminder to fix it.
Differences between select and find_all (for my future reference):
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class https://community.dataquest.io/t/find-all-and-select-difference-beautifoulsoup/299101