jackrabbit-oak
jackrabbit-oak copied to clipboard
OAK-9875 - Prefix query on a Long, analyzed field fails when executed over Elastic
In ES, for properties with analyzed=true, use multi-fields for the full-text index:
"properties": {
"propa": {
"type": "long",
"ignore_malformed" : true,
"fields" : {
"text" : {
"type" : "text",
"analyzer" : "oak_analyzer"
}
}
}
}
This fixes a bug where if a property had a non-string type and was marked as analyzed, the ES plugin would not create a field in ES for full text search, so full-text searches would fail.
Additional changes
- Use the Elastic option
ignore_malformedto make Elastic accept documents with values that cannot be coerced to the type of the field. Without this option, if the user would provide an invalid value for long, then Elastic would discard the field and would also not index it as full-text. With ignore_malformed, the top level field will be ignored but the nested full-text index will still be updated. This is in line with the Lucene plugin, which always indexes a property, even if its value cannot be converted to the type. - Previously, the Elastic plugin was trying to convert the value to the type of the mapping before sending it to Elastic. This was done as a workaround to a bug where adding a string that did not represent a valid date to a field of type Date would result in Elastic throwing an exception and discarding the document (See OAK-9665). In the previous implementation, if the Elastic plugin failed to convert the value to the type of the property, it would log a warning and discard the property. But we can no longer ignore the property, because even if it does not have a valid value, we must send it to Elastic to index as full-text in the nested field. Therefore, this PR relies on setting
ignore_malformedin Elastic and sends to Elastic the value as received from the user, relying on Elastic to convert it. - The only exception is for boolean properties, because Elastic does not support
ignore_malformedfor this type. Here the PR preserves the current behavior of trying to convert in OAK the value to a boolean and not indexing the field if that fails. This means that boolean properties do not supportanalyzed=true(we log an error when the index is created).
Update: there is an Elasticsearch issue to add support for ignore_malformed to boolean fields: https://github.com/elastic/elasticsearch/issues/89542.
Minor fixes:
- Fix:
AbstractQueryTest#assertResultwas not comparing the expected and actual lists correctly, it was only checking that all elements of the expected list are also in the actual list. For instance, these two lists would pass the assertion:expected=['a', 'a'],actual=['a', 'b']. - Improved logging of errors in requests to Elastic.
- Correct a few spelling mistakes.