scholarly icon indicating copy to clipboard operation
scholarly copied to clipboard

Raising StopIteration Errors for some queries even when the http requests are successful (using ScraperAPI).

Open EthanC111 opened this issue 2 years ago • 3 comments

Describe the bug Both of the queries provided below will throw StopIteration errors even when the http requests are successful.

To Reproduce

import logging
import sys

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[
        logging.StreamHandler(sys.stdout),
    ],
)

from scholarly import scholarly
from scholarly import ProxyGenerator

scraper_api_key = "YOUR_SCRAPER_API_KEY"
# query = "A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation"
query = "Reducing the Dimensionality of Data with Neural Networks."

pg = ProxyGenerator()
success = pg.ScraperAPI(scraper_api_key)
scholarly.use_proxy(pg)
results = scholarly.search_pubs(query)
paper_info = next(results)
print(paper_info)

Expected behavior Should be printing the paper information.

Screenshots scholary_bug

Desktop (please complete the following information):

  • Proxy service: ScraperAPI
  • python version: 3.11.4
  • OS: linux
  • Version 1.7.11

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

  • [ ] Yes, I will create a Pull Request with the bugfix.

Additional context Add any other context about the problem here.

EthanC111 avatar Jul 21 '23 05:07 EthanC111

I believe this is when the result is the new google scholar UI that came in this June or so. It happens when it's a single result most of the time. You can try this in publication_parser.py. Add this to line 61. + self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar')

ronny3 avatar Aug 29 '23 09:08 ronny3

I believe this is when the result is the new google scholar UI that came in this June or so. It happens when it's a single result most of the time. You can try this in publication_parser.py. Add this to line 61. + self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar')

Thanks for pointing this out, just to be clear, the full line 61 should be changed from

self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl') + self._soup.find_all('div', class_='gsc_mpat_ttl')

to

self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar') + self._soup.find_all('div', class_='gsc_mpat_ttl')

Then it works.

kostrykin avatar Sep 04 '23 08:09 kostrykin

Was seeing intermittent failures again in April 2024. Needed to update that line (around 61) to be: self._rows = self._soup.find_all('div', class_='gs_r gs_or gs_scl gs_fmar') + self._soup.find_all('div', class_='gsc_mpat_ttl') + self._soup.find_all('div', class_='gs_r gs_or gs_scl')

gdudek avatar Apr 14 '24 02:04 gdudek