Inconsistent + buggy behaviour of textContainsPrefix / Undocumented cases
This issue is resulting out of the following discussion: https://github.com/JanusGraph/janusgraph/discussions/3942
First, the documentation for textContainsPrefix is incomplete (more exactly: contradictory) for the case, where the search string contains multiple words / tokens (please have a look at the OP of the discussion for details). I came up with the following plausible behaviour inferred from the single token case:
"For each token in the query string, at least one token in the text string (read: value of the field which is searched) has to be present, where query token is a prefix of text token"
According to @mad the In-memory implementation org.janusgraph.core.attribute.Text#CONTAINS_PREFIX works as I inferred, but SolrIndex and also LuceneIndex behave differently. Also according to @mad this could be considered a bug. For the LuceneIndex we found the actual behaviour (works just like regular tokenized textContains if query string consists of multiple tokens) and a possible fix to get it working as described above.
If I would be in charge of this, I would first propose to agree on the desired behaviour of textContainsPrefix for the multi token case, where I would actually propose the behaviour above. Then I would consistently implement this across all possible index backends and also very importantly: Add a description of the behavour to the documentation. But, I am not in charge (Spoiler: Probably don't have the time for a PR). So what do you as contributors / maintainers think of how to resolve the issue?
- Version: at least 0.6.3
- Storage Backend: possibly all?
- Mixed Index Backend: at least SolrIndex, LuceneIndex
- Link to discussed bug: https://github.com/JanusGraph/janusgraph/discussions/3942
- Expected Behavior: textContainsPrefix should behave consistently and reasonable if query string contains multiple tokens. Further, that behaviour should be documented.
- Current Behavior: textContainsPrefix behaves differently across backends.