janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases

Open mrckzgl opened this issue 2 years ago • 0 comments

This issue is resulting out of the following discussion: https://github.com/JanusGraph/janusgraph/discussions/3942

First, the documentation for textContainsPrefix is incomplete (more exactly: contradictory) for the case, where the search string contains multiple words / tokens (please have a look at the OP of the discussion for details). I came up with the following plausible behaviour inferred from the single token case:

"For each token in the query string, at least one token in the text string (read: value of the field which is searched) has to be present, where query token is a prefix of text token"

According to @mad the In-memory implementation org.janusgraph.core.attribute.Text#CONTAINS_PREFIX works as I inferred, but SolrIndex and also LuceneIndex behave differently. Also according to @mad this could be considered a bug. For the LuceneIndex we found the actual behaviour (works just like regular tokenized textContains if query string consists of multiple tokens) and a possible fix to get it working as described above.

If I would be in charge of this, I would first propose to agree on the desired behaviour of textContainsPrefix for the multi token case, where I would actually propose the behaviour above. Then I would consistently implement this across all possible index backends and also very importantly: Add a description of the behavour to the documentation. But, I am not in charge (Spoiler: Probably don't have the time for a PR). So what do you as contributors / maintainers think of how to resolve the issue?

  • Version: at least 0.6.3
  • Storage Backend: possibly all?
  • Mixed Index Backend: at least SolrIndex, LuceneIndex
  • Link to discussed bug: https://github.com/JanusGraph/janusgraph/discussions/3942
  • Expected Behavior: textContainsPrefix should behave consistently and reasonable if query string contains multiple tokens. Further, that behaviour should be documented.
  • Current Behavior: textContainsPrefix behaves differently across backends.

mrckzgl avatar Oct 20 '23 13:10 mrckzgl