Author search blocked by Google
Describe the bug Author search blocked by Google
To Reproduce
from scholarly import scholarly
# Retrieve the author's data, fill-in, and print
# Get an iterator for the author results
search_query = scholarly.search_author('Steven A Cholewiak')
# Retrieve the first result from the iterator
first_author_result = next(search_query)
scholarly.pprint(first_author_result)
2025-05-11 13:51:33,811 - INFO - Getting https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=Steven%20A%20Cholewiak 2025-05-11 13:51:35,408 - INFO - HTTP Request: GET https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=Steven%20A%20Cholewiak "HTTP/1.1 302 Found" 2025-05-11 13:51:35,481 - INFO - HTTP Request: GET https://accounts.google.com/Login?hl=en&continue=https://scholar.google.com/citations%3Fhl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&service=citations "HTTP/1.1 302 Found" 2025-05-11 13:51:35,529 - INFO - HTTP Request: GET https://accounts.google.com/InteractiveLogin?continue=https://scholar.google.com/citations?hl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&hl=en&service=citations&ifkv=ASKV5MiDdp9s76y10mF7QYoVRbBRi3tb3FF_ZVdN9BvxXzaABqA4aC-ZoFFkPH09yxni5k7g5D0vww "HTTP/1.1 302 Moved Temporarily" 2025-05-11 13:51:35,582 - INFO - HTTP Request: GET https://accounts.google.com/v3/signin/identifier?continue=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26view_op%3Dsearch_authors%26mauthors%3DSteven%2520A%2520Cholewiak&hl=en&ifkv=ASKV5Mh-6rf0iFUy8wTCtqoNW-KAPOcBm4zKiKhDdpawNgv3YhNc86PfToW0L-oGpSMsmdgYmy_Ing&service=citations&flowName=GlifWebSignIn&flowEntry=ServiceLogin&dsh=S-1575736983%3A1746967895547990 "HTTP/1.1 200 OK" 2025-05-11 13:51:35,713 - INFO - Found 0 authors 2025-05-11 13:51:35,715 - INFO - No more author pages
StopIteration Traceback (most recent call last) Cell In[11], line 7 5 search_query = scholarly.search_author('Steven A Cholewiak') 6 # Retrieve the first result from the iterator ----> 7 first_author_result = next(search_query) 8 scholarly.pprint(first_author_result)
StopIteration:
Expected behavior Should retrieve the author info
Desktop (please complete the following information):
- Proxy service: none
- python version: 3.11.4
- OS: macOS 14.7.4 (23H420)
- Version 1.7.11
Do you plan on contributing? Your response below will clarify whether the maintainers can expect you to fix the bug you reported.
- No
Additional context Direct requests for Scholar Author ID still work
Having the same issue
Unfortunately, this happened to me as well today. Tried searching with ID and name.
Neither
scholar_id = 'qc6CJjYAAAAJ' scholar = scholarly.search_author_id(scholar_id)
nor
search_query = scholarly.search_author('Steven A Cholewiak') first_author_result = next(search_query)
worked for me
I got this error as well, but thought that it was because of this error:
/home/fccoelho/Documentos/fccoelho.github.com/.venv/lib/python3.13/site-packages/scholarly/_scholarly.py:312: SyntaxWarning: invalid escape sequence '\d'
m = re.search("cites=[\d+,]*", object["citedby_url"])
But after I fixed this bug, I can confirm that I am getting the stop Iteration as reported in this issue
Maybe this is because the Scholar API has changed? Here is one URL that works (tested now): https://scholar.google.com.br/scholar?as_q=&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=Fl%C3%A1vio+Code%C3%A7o+Coelho&as_publication=&as_ylo=&as_yhi=&hl=pt-BR&as_sdt=0%2C5
Notice that now authors are passed as parameter as_authors
I found another one on the site, more similar to the one used by scholarly:
https://scholar.google.com.br/citations?view_op=search_authors&mauthors=author:Fl%C3%A1vio+author:Code%C3%A7o+author:Coelho&hl=pt-BR&oi=ao
However, in _scholarly.py the URL used does not include the prefix author: to the name: https://github.com/scholarly-python-package/scholarly/blob/9269ff36ad2314e6cc0c5b499efc3b79b844707e/scholarly/_scholarly.py#L18
It looks like author searches at google scholar now requires the user to login first, so would probably need to set up some authentication to make it work.
Is Scholarly still being maintained?
Is Scholarly still being maintained?
Good question. Maybe we should switch to OpenAlex or Semantic Scholar APIs. We have to admit Scholarly is very limited in its functions (e.g. cannot even retrieve citations for the last year only) with poor documentation, and constant rate limiting by Google is very annoying.
I added a workaround in my own library here: https://github.com/finsberg/pygscholar/pull/41
Is Scholarly still being maintained?
It's getting extremely difficult to maintain it with the constant blocking from Google Scholar.
@fccoelho could you open a PR with the work around that works for you?
I have not verified that it solves the supposed user authentication requirement, but I'll run some more tests and see if I can come up with a solution.