Incorrect lemma for 'rebooked'
Seems to be an incorrect lemma for the word rebooked, which is being returned as 'rebooke' instead of 'rebook'
How to reproduce the behaviour
import spacy
load_model = spacy.load('en_core_web_sm', disable = ['parser','ner'])
doc = load_model("I just rebooked my flight")
print(" ".join([token.lemma_ for token in doc]))
Output: I just rebooke my flight
Your Environment
- Operating System: MacOS
- Python Version Used: 3.9.7
- spaCy Version Used: 3.1.4
- Environment Information: Spacy via Conda env
Thanks for the report! English uses a rule-based lemmatizer that currently has word lists to handle like "book" but not for some of the related forms like "rebook".
If you'd like to update the rules to handle "rebook", you can do it like this by adding "rebook" to a list of known verb lemmas:
import spacy
nlp = spacy.load("en_core_web_sm")
nlp.get_pipe("lemmatizer").lookups.get_table("lemma_index")["verb"].append("rebook")
Be aware that there's a lemmatizer cache, so you need to modify the tables before processing any texts or you might get the cached lemmas instead of seeing the effects of your modifications.
Edited to add: you can save this pipeline with nlp.to_disk and it will save your changes so you if you reload the local directory (or repackage it with spacy package) you don't have to make the changes every time.
We recently published a new FAQ about lemmatizers that should cover this kind of issue: https://github.com/explosion/spaCy/discussions/11685
This issue has been automatically closed because it was answered and there was no follow-up discussion.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.