python-tf-idf Similarities between documents and query may be >1

The README claims that similarities between documents and queries shouldn't be greater than 1. However:

table = tfidf.tfidf()
table.addDocument("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"])
table.addDocument("bar", ["alpha", "bravo", "charlie", "india", "juliet", "kilo"])
table.addDocument("baz", ["kilo", "lima", "mike", "november"])
print table.similarities (["alpha", "bravo", "charlie", "india"])

Yields [['foo', 0.5625], ['bar', 1.0416666666666665], ['baz', 0.0]]. Whoops!

This is happening because the query isn't being normalized. The ranking of results should still be correct, but it'd be better if we normalized it so we can make guarantees about the output.

Mar 21 '16 19:03 hrs

I meet the same problem, please solve it, thanks.

Feb 26 '18 08:02 tianye2856

what is the solution you guys did it to solve it

Apr 08 '18 18:04 shanalikhan