spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Incorrect lemma for 'rebooked'

Open millnerryan opened this issue 3 years ago • 1 comments

Seems to be an incorrect lemma for the word rebooked, which is being returned as 'rebooke' instead of 'rebook'

How to reproduce the behaviour

import spacy
load_model = spacy.load('en_core_web_sm', disable = ['parser','ner'])
doc = load_model("I just rebooked my flight")
print(" ".join([token.lemma_ for token in doc]))

Output: I just rebooke my flight

Your Environment

  • Operating System: MacOS
  • Python Version Used: 3.9.7
  • spaCy Version Used: 3.1.4
  • Environment Information: Spacy via Conda env

millnerryan avatar Aug 29 '22 23:08 millnerryan

Thanks for the report! English uses a rule-based lemmatizer that currently has word lists to handle like "book" but not for some of the related forms like "rebook".

If you'd like to update the rules to handle "rebook", you can do it like this by adding "rebook" to a list of known verb lemmas:

import spacy
nlp = spacy.load("en_core_web_sm")
nlp.get_pipe("lemmatizer").lookups.get_table("lemma_index")["verb"].append("rebook")

Be aware that there's a lemmatizer cache, so you need to modify the tables before processing any texts or you might get the cached lemmas instead of seeing the effects of your modifications.

Edited to add: you can save this pipeline with nlp.to_disk and it will save your changes so you if you reload the local directory (or repackage it with spacy package) you don't have to make the changes every time.

adrianeboyd avatar Sep 05 '22 13:09 adrianeboyd

We recently published a new FAQ about lemmatizers that should cover this kind of issue: https://github.com/explosion/spaCy/discussions/11685

adrianeboyd avatar Oct 28 '22 12:10 adrianeboyd

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions[bot] avatar Nov 05 '22 00:11 github-actions[bot]

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Dec 06 '22 00:12 github-actions[bot]