solarium icon indicating copy to clipboard operation
solarium copied to clipboard

How to get the most relevant terms for a specific document?

Open MustafaKarabulut opened this issue 5 years ago • 3 comments

Hi,

Is there a way to get the most important/relevant terms (i.e., terms with the highest TF/IDF score) for a specific document?

I think this is possible with Lucene in some way: https://stackoverflow.com/questions/38976466/get-n-terms-with-top-tfidf-scores-for-each-documents-in-lucene-pylucene

MustafaKarabulut avatar May 30 '20 15:05 MustafaKarabulut

Hi again,

After a bit of time with Solr documentation, I can finally confirm that this type of query (important terms for a specific document) is possible via the "Term Vector" component).

https://lucene.apache.org/solr/guide/6_6/the-term-vector-component.html

Unfortunately, it seems to me Solarium does not yet support this type of queries as to the documentation and the source code. I'd be more than happy if anyone objects to my conclusion and proves me wrong with a piece of example code.

MustafaKarabulut avatar Jun 04 '20 12:06 MustafaKarabulut

hi there,

yes, as far as i know you're right and there's currently no built in support for solr's term vector component in solarium.

in theory (to be clear, i've not tested this myself and it's rather hacky) you can probably query the term vector component by changing the handler of your select query, and the response class to make sure you have access to all response properties:

$select = $client->createSelect()
$select
    ->setHandler('tvrh')
    ->setResultClass(MyResult::class)
;

then use the CustomizeRequest to pass the request parameters you want.

the best solution of course would be addition of support for this component to the solarium library itself. that's currently not on our roadmap, but if you're up for it we're more than happy to receive a pull request with this feature.

wickedOne avatar Jun 04 '20 22:06 wickedOne

Another "hack" I've seen very often is to use the Ping query and to modify it instead of Select.

You can also make the term vector component available for Select on solrconf.xml.

mkalkbrenner avatar Jun 05 '20 08:06 mkalkbrenner

@MustafaKarabulut Solarium 6.3.4 has been released with support for the Term Vector Component.

thomascorthals avatar Dec 04 '23 12:12 thomascorthals