How to get the most relevant terms for a specific document?
Hi,
Is there a way to get the most important/relevant terms (i.e., terms with the highest TF/IDF score) for a specific document?
I think this is possible with Lucene in some way: https://stackoverflow.com/questions/38976466/get-n-terms-with-top-tfidf-scores-for-each-documents-in-lucene-pylucene
Hi again,
After a bit of time with Solr documentation, I can finally confirm that this type of query (important terms for a specific document) is possible via the "Term Vector" component).
https://lucene.apache.org/solr/guide/6_6/the-term-vector-component.html
Unfortunately, it seems to me Solarium does not yet support this type of queries as to the documentation and the source code. I'd be more than happy if anyone objects to my conclusion and proves me wrong with a piece of example code.
hi there,
yes, as far as i know you're right and there's currently no built in support for solr's term vector component in solarium.
in theory (to be clear, i've not tested this myself and it's rather hacky) you can probably query the term vector component by changing the handler of your select query, and the response class to make sure you have access to all response properties:
$select = $client->createSelect()
$select
->setHandler('tvrh')
->setResultClass(MyResult::class)
;
then use the CustomizeRequest to pass the request parameters you want.
the best solution of course would be addition of support for this component to the solarium library itself. that's currently not on our roadmap, but if you're up for it we're more than happy to receive a pull request with this feature.
Another "hack" I've seen very often is to use the Ping query and to modify it instead of Select.
You can also make the term vector component available for Select on solrconf.xml.
@MustafaKarabulut Solarium 6.3.4 has been released with support for the Term Vector Component.