python-tf-idf
python-tf-idf copied to clipboard
An extremely simple Python library to perform TF-IDF document comparison.
Also updated the readme
I've seen tf-idf used in many cases to identify the top-n terms that are most unique to a particular document. I don't know how much active development there is on...
See the description of TF-IDF on Wikipedia, and in particular the example: https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Example_of_tf%E2%80%93idf. In the code, `doc_dict` correctly computes the "TF" (term frequency) part. However, the "IDF" part (inverse document...
The README claims that similarities between documents and queries shouldn't be greater than 1. However: ``` python table = tfidf.tfidf() table.addDocument("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"]) table.addDocument("bar",...