scholarly icon indicating copy to clipboard operation
scholarly copied to clipboard

Author search blocked by Google

Open alekseybelikov opened this issue 8 months ago • 12 comments

Describe the bug Author search blocked by Google

To Reproduce

from scholarly import scholarly

# Retrieve the author's data, fill-in, and print
# Get an iterator for the author results
search_query = scholarly.search_author('Steven A Cholewiak')
# Retrieve the first result from the iterator
first_author_result = next(search_query)
scholarly.pprint(first_author_result)

2025-05-11 13:51:33,811 - INFO - Getting https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=Steven%20A%20Cholewiak 2025-05-11 13:51:35,408 - INFO - HTTP Request: GET https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=Steven%20A%20Cholewiak "HTTP/1.1 302 Found" 2025-05-11 13:51:35,481 - INFO - HTTP Request: GET https://accounts.google.com/Login?hl=en&continue=https://scholar.google.com/citations%3Fhl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&service=citations "HTTP/1.1 302 Found" 2025-05-11 13:51:35,529 - INFO - HTTP Request: GET https://accounts.google.com/InteractiveLogin?continue=https://scholar.google.com/citations?hl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&hl=en&service=citations&ifkv=ASKV5MiDdp9s76y10mF7QYoVRbBRi3tb3FF_ZVdN9BvxXzaABqA4aC-ZoFFkPH09yxni5k7g5D0vww "HTTP/1.1 302 Moved Temporarily" 2025-05-11 13:51:35,582 - INFO - HTTP Request: GET https://accounts.google.com/v3/signin/identifier?continue=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&hl=en&ifkv=ASKV5Mh-6rf0iFUy8wTCtqoNW-KAPOcBm4zKiKhDdpawNgv3YhNc86PfToW0L-oGpSMsmdgYmy_Ing&service=citations&flowName=GlifWebSignIn&flowEntry=ServiceLogin&dsh=S-1575736983%3A1746967895547990 "HTTP/1.1 200 OK" 2025-05-11 13:51:35,713 - INFO - Found 0 authors 2025-05-11 13:51:35,715 - INFO - No more author pages


StopIteration Traceback (most recent call last) Cell In[11], line 7 5 search_query = scholarly.search_author('Steven A Cholewiak') 6 # Retrieve the first result from the iterator ----> 7 first_author_result = next(search_query) 8 scholarly.pprint(first_author_result)

StopIteration:

Expected behavior Should retrieve the author info

Desktop (please complete the following information):

  • Proxy service: none
  • python version: 3.11.4
  • OS: macOS 14.7.4 (23H420)
  • Version 1.7.11

Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.

  • No

Additional context Direct requests for Scholar Author ID still work

alekseybelikov avatar May 11 '25 12:05 alekseybelikov

Having the same issue

okoppe avatar May 12 '25 01:05 okoppe

Unfortunately, this happened to me as well today. Tried searching with ID and name.

Neither scholar_id = 'qc6CJjYAAAAJ' scholar = scholarly.search_author_id(scholar_id) nor search_query = scholarly.search_author('Steven A Cholewiak') first_author_result = next(search_query) worked for me

jiadingfang avatar May 12 '25 06:05 jiadingfang

I got this error as well, but thought that it was because of this error:

/home/fccoelho/Documentos/fccoelho.github.com/.venv/lib/python3.13/site-packages/scholarly/_scholarly.py:312: SyntaxWarning: invalid escape sequence '\d'
  m = re.search("cites=[\d+,]*", object["citedby_url"])

fccoelho avatar May 13 '25 13:05 fccoelho

But after I fixed this bug, I can confirm that I am getting the stop Iteration as reported in this issue

fccoelho avatar May 13 '25 13:05 fccoelho

Maybe this is because the Scholar API has changed? Here is one URL that works (tested now): https://scholar.google.com.br/scholar?as_q=&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=Fl%C3%A1vio+Code%C3%A7o+Coelho&as_publication=&as_ylo=&as_yhi=&hl=pt-BR&as_sdt=0%2C5

Notice that now authors are passed as parameter as_authors

I found another one on the site, more similar to the one used by scholarly:

https://scholar.google.com.br/citations?view_op=search_authors&mauthors=author:Fl%C3%A1vio+author:Code%C3%A7o+author:Coelho&hl=pt-BR&oi=ao

However, in _scholarly.py the URL used does not include the prefix author: to the name: https://github.com/scholarly-python-package/scholarly/blob/9269ff36ad2314e6cc0c5b499efc3b79b844707e/scholarly/_scholarly.py#L18

fccoelho avatar May 13 '25 13:05 fccoelho

It looks like author searches at google scholar now requires the user to login first, so would probably need to set up some authentication to make it work.

finsberg avatar May 13 '25 19:05 finsberg

Is Scholarly still being maintained?

fccoelho avatar May 13 '25 22:05 fccoelho

Is Scholarly still being maintained?

Good question. Maybe we should switch to OpenAlex or Semantic Scholar APIs. We have to admit Scholarly is very limited in its functions (e.g. cannot even retrieve citations for the last year only) with poor documentation, and constant rate limiting by Google is very annoying.

alekseybelikov avatar May 13 '25 22:05 alekseybelikov

I added a workaround in my own library here: https://github.com/finsberg/pygscholar/pull/41

finsberg avatar May 14 '25 09:05 finsberg

Is Scholarly still being maintained?

It's getting extremely difficult to maintain it with the constant blocking from Google Scholar.

arunkannawadi avatar May 14 '25 19:05 arunkannawadi

@fccoelho could you open a PR with the work around that works for you?

arunkannawadi avatar May 14 '25 19:05 arunkannawadi

I have not verified that it solves the supposed user authentication requirement, but I'll run some more tests and see if I can come up with a solution.

fccoelho avatar May 15 '25 11:05 fccoelho