openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Fix for #6294: add a book error with notfound trying to match existing work

Open scottbarnes opened this issue 3 years ago • 1 comments

Closes #6294

Fix

Technical

This fixes the "notfound" error that occurs when adding an existing work, but a new edition, via /books/add.

It also fixes a related KeyError that would occur when adding a book via /books/add that exactly matched an existing work and edition.

With respect to the "notfound" error, this happened because make_work() in openlibrary/plugins/upstream/addbook.py was adding /works/ to every doc.key it processed. make_work() is used as a wrapper that processes solr results, all of which already have the "/works/" prefix, so every work was getting a key of /works//works/OL123W, as seen in #6294. This PR removes that line, and thereby removes the duplication of the "/works" prefix.

Additionally, this fixes the KeyError caused when adding a book via /books/add that is identical to an existing work and edition that also lacks an author. The KeyError came from doc['author_name'] in make_author(), when the key didn't exist because there was no author information. The fix here was more sane defaults with doc.get('author_name', '').

To expand on this a bit, make_work() is run as a wrapper on solr results during the duplicate check. This error manifested itself when someone used /books/add to add a book that exactly matched a work and edition that lacked an author; solr would return book matches, run the wrapper (line 169 in /openlibrary/utils/solr.py), and cause a KeyError because author_name didn't exist. I've attached a log of this error as a zip, but here's a screenshot for potentially quicker skimming: image blob_040354495265.html.zip

Testing

I've included tests, but here is some before and after to help illustrate before and after.

As mentioned in #6294, adding a new book that matched an existing work, but not an edition, would cause a "notfound" error. That no longer happens.

Here's an example using Robinson Crusoe, which to begin with has only the 1920 G.W. Jacobs and Co. edition: image

And here's adding a new 1921 edition via /books/add: image

Now there are two editions: 1920, and 1921: image

And here's trying to add an identical 1920 G.W. Jacobs and Co. edition, which picks up where we left off with the 1920 and 1921 editions: image

After submission, this displays: image

Behind the scenes, this is the error (a zip of the logged error is included below the text output)

openlibrary-web-1           | 2022-09-19 21:44:58 [openlibrary.logger] [INFO] solr request: http://solr:8983/solr/openlibrary/select?wt=json&q.op=AND&q=title%3A%22Robinson+Crusoe%22+AND+author_key%3A%22OL18283A%22+AND+publisher%3A%22G.W.+Jacobs+and+Co.%22+AND+publish_date%3A%221920%22&start=0
openlibrary-solr-1          | 2022-09-19 21:44:58.755 INFO  (qtp988850650-28) [   x:openlibrary] o.a.s.c.S.Request [openlibrary]  webapp=/solr path=/select params={q=title:"Robinson+Crusoe"+AND+author_key:"OL18283A"+AND+publisher:"G.W.+Jacobs+and+Co."+AND+publish_date:"1920"&start=0&q.op=AND&wt=json} hits=1 status=0 QTime=2
openlibrary-web-1           | Traceback (most recent call last):
openlibrary-web-1           |   File "/home/openlibrary/.local/lib/python3.10/site-packages/web/application.py", line 280, in process
openlibrary-web-1           |     return self.handle()
openlibrary-web-1           |   File "/home/openlibrary/.local/lib/python3.10/site-packages/web/application.py", line 271, in handle
openlibrary-web-1           |     return self._delegate(fn, self.fvars, args)
openlibrary-web-1           |   File "/home/openlibrary/.local/lib/python3.10/site-packages/web/application.py", line 517, in _delegate
openlibrary-web-1           |     return handle_class(cls)
openlibrary-web-1           |   File "/home/openlibrary/.local/lib/python3.10/site-packages/web/application.py", line 495, in handle_class
openlibrary-web-1           |     return tocall(*args)
openlibrary-web-1           |   File "/openlibrary/infogami/utils/app.py", line 200, in <lambda>
openlibrary-web-1           |     HEAD = GET = POST = PUT = DELETE = lambda self: delegate()
openlibrary-web-1           |   File "/openlibrary/infogami/utils/app.py", line 220, in delegate
openlibrary-web-1           |     return getattr(cls(), method)(*args)
openlibrary-web-1           |   File "/openlibrary/openlibrary/plugins/upstream/addbook.py", line 232, in POST
openlibrary-web-1           |     match = None if created_author else self.find_matches(i)
openlibrary-web-1           |   File "/openlibrary/openlibrary/plugins/upstream/addbook.py", line 291, in find_matches
openlibrary-web-1           |     edition = self.try_edition_match(
openlibrary-web-1           |   File "/openlibrary/openlibrary/plugins/upstream/addbook.py", line 377, in try_edition_match
openlibrary-web-1           |     result = solr.select(q, doc_wrapper=make_work, q_op="AND")
openlibrary-web-1           |   File "/openlibrary/openlibrary/utils/solr.py", line 155, in select
openlibrary-web-1           |     return self._parse_solr_result(
openlibrary-web-1           |   File "/openlibrary/openlibrary/utils/solr.py", line 169, in _parse_solr_result
openlibrary-web-1           |     d.docs = [doc_wrapper(doc) for doc in response['docs']]
openlibrary-web-1           |   File "/openlibrary/openlibrary/utils/solr.py", line 169, in <listcomp>
openlibrary-web-1           |     d.docs = [doc_wrapper(doc) for doc in response['docs']]
openlibrary-web-1           |   File "/openlibrary/openlibrary/plugins/upstream/addbook.py", line 94, in make_work
openlibrary-web-1           |     for key, name in zip(doc.get['author_key'], doc.get['author_name'])
openlibrary-web-1           | TypeError: 'builtin_function_or_method' object is not subscriptable
openlibrary-web-1           |
openlibrary-web-1           | error saved to /var/log/openlibrary/ol-errors/2022-09-19/214458761978.html

duplicate_edition_214458761978.html.zip

After the change, again, entering matching 1920 G.W. Jacobs and Co. edition information: image

And finally, no new 1920 edition has been added: image

Screenshot

Stakeholders

@cdrini

scottbarnes avatar Sep 19 '22 22:09 scottbarnes

@cdrini I have discovered another related bug. This needs more work. I will update the PR when I have further information and a fix.

Edit: after discussing this with @cdrini, I have decided to submit this PR as is. It seems this code path may have been lying fallow for some time, and there are at least two more problems that have made themselves apparent only upon the bug at issue being addressed.

However, this PR fixes the explicitly stated issue, and the related, yet different, issues still present rightly deserve their own, well, issues here in GitHub. I shall investigate a bit more and create the separate issue(s).

scottbarnes avatar Sep 21 '22 04:09 scottbarnes