pywsd icon indicating copy to clipboard operation
pywsd copied to clipboard

pos mismatch breaks similiarity

Open aponty opened this issue 6 years ago • 0 comments

Love the tool! Super helpful. However, it bugs out if you try to run the maxsim disambiguation on a sentence where the wn.sysnet pos doesn't match the NLTK tagged pos.

Try running

sen = 'these potato chips are great'
disambiguate(sen, algorithm=maxsim)

and you get an index out of range error because result in max_similarity in similarity.py is [], because wn.synsets(ambiguous_word, pos=pos) is nothing as NLTK has (incorrectly) decided the part of speach of 'Potato' is an adjective, and there's no synset for that.

A very simple fix- change line 114 from

for i in wn.synsets(ambiguous_word, pos=pos):

to

for i in wn.synsets(ambiguous_word, pos=pos) or wn.synsets(ambiguous_word):

to provide a fallback option

aponty avatar Jun 10 '19 15:06 aponty