solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-15450 add PrefetchingFieldValueFeature

Open tomglk opened this issue 4 years ago • 4 comments

Description

This PR refers to the issue SOLR-15450.

It introduces a new PrefetchingFieldValueFeature which can be used for stored fields without docValues. This feature prefetches all stored fields that itself and other PrefetchingFieldValueFeatures use so that they are available in the cache. This improves the performance because we only need to access the index during the first loading of all fields that should be prefetched instead of reading the fields used by the features separately.

Solution

The PrefetchingFieldValueFeature extends the FieldValueFeature. It only is used to get the values for fields which are stored and have no docValues. For all other fields it delegates the work to the FieldValueFeature because in such cases we want to make use of the docValues. The PrefetchingFieldValueFeature keeps a list of all fields that should be fetched and uses these fields when requesting the document from the docFetcher.

private Set<String> prefetchFields;

final Document document = docFetcher.doc(context.docBase + itr.docID(), prefetchFields);

The prefetchFields are updated after all features were loaded. To do so, it keeps track of the feature stores that were updated, iterates over all PrefetchingFieldValueFeatures per store, collects the fields and then updates them.

Set<String> updatedFeatureStores = new HashSet<>();

preparePrefetchingFieldValueFeatures(updatedFeatureStores);

private void preparePrefetchingFieldValueFeatures(Set<String> updatedFeatureStores) {
    for(String featureStoreName: updatedFeatureStores) {
      final Set<PrefetchingFieldValueFeature> prefetchingFieldValueFeatures = getFeatureStore(featureStoreName)
          .getFeatures().stream()
          .filter(feature -> feature instanceof PrefetchingFieldValueFeature)
          .map(feature -> ((PrefetchingFieldValueFeature) feature))
          .collect(Collectors.toSet());

      final Set<String> prefetchFields = new HashSet<>();
      for (PrefetchingFieldValueFeature feature : prefetchingFieldValueFeatures) {
        prefetchFields.add(feature.getField());
      }
      for (PrefetchingFieldValueFeature feature : prefetchingFieldValueFeatures) {
        feature.setPrefetchFields(prefetchFields);
      }
    }
  }

Tests

TestPrefetchingFieldValueFeature tests that

  • the prefetchFields are only updated after all features were loaded and the basic setting works
  • adding fields to an existing feature works

TestLTROnSolrCloudWithPrefetchingFieldValueFeature tests that

  • the basic reRanking works
  • the PrefetchingFieldValueFeatureScorers correctly prefetch the fields (using assertions on the hasBeenLoaded-property of LazyFields and checking the StoredFields of the Document)

Checklist

Please review the following and check all that apply:

  • [X] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [X] I have created a Jira issue and added the issue ID to my pull request title.
  • [X] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [X] I have developed this patch against the main branch.
  • [X] I have run ./gradlew check -x test.
  • [ ] I have run ./gradlew check.
  • [X] I have run the tests in org.apache.solr.ltr.test.
  • [X] I have added tests for my changes.
  • [X] I have added documentation for the Reference Guide NOTE: very sparsely, I was not sure how much detail I should provide

tomglk avatar Jun 05 '21 15:06 tomglk

Hi @cpoerschke , thank you for the quick response. :) I addressed your suggestions that were quick to include and will look into the exception-handling / toString()-topics next.

Not yet looked at the test changes

I am actually quite glad that you didn't, because I just changed the tests quite a lot. Now there is quite a lot of duplicated code if you look at the two prefetching tests in org.apache.solr.ltr and the TestLTROnSolrCloud. I already had a TestLTROnSolrCloudBase-class ready to extract the duplicates but was not sure whether to include such a change in this PR because it it would entail a few changes in TestLTROnSolrCloud which have nothing to do with the prefetching functionality.

I can make a commit tomorrow(?) only including this change and if you think that it would make the PR too complex we can just revert it.

EDIT: Link to the "restructure-commit" : 88d1897

tomglk avatar Jun 06 '21 18:06 tomglk

Hi @cpoerschke , rereading our conversation over the last two days made clear that we have quite some open constructions sites. To not lose the overview about them (especially with a likely break from my side due to the upcoming Buzzwords) I want to put a summary of possible next steps and questions here:

  • [ ] do we want to collect the prefetchFields after creating the weights instead of on the feature-level?
    • concurrency vs. collecting more often
  • [X] remove the "disablePrefetchingFieldValueFeature" parameter ( 81a5c67 )
    • add try-catch with single field to score()?
      • if yes, then log error instead of warning?
    • added try-catch with error logging
  • [X] use sorted set for prefetchFields ( d0520a9 )
  • [ ] come to a decision regarding options to exclude / include fields
    • do we want that?
      • if yes, how (parameter in json / something per query)
    • proposed solution / answer: No. This could be an additional feature that users can implement (like your outline here https://github.com/apache/solr/pull/166#discussion_r648536798 ) but I would exclude it from this feature because we do not have clear use cases yet

tomglk avatar Jun 13 '21 17:06 tomglk

... summary of possible next steps and questions ...

Splendid idea, thanks for adding it!

cpoerschke avatar Jun 18 '21 19:06 cpoerschke

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Feb 25 '24 00:02 github-actions[bot]