highlighted results are misleading
First, thank you for this library and its rich API. I've spent several days carefully studying the documentation and have implemented most of my desired functionality. However, I'm encountering a confusing highlighting result that I can't resolve.
Minimal Reproduction:
import { Document, Charset } from "https://cdn.jsdelivr.net/gh/nextapps-de/[email protected]/dist/flexsearch.compact.module.min.js";
const data = [
{ "id": 1, "title": "Carmencita" },
{ "id": 2, "title": "en-US.json" }
];
const index = new Document({
document: {
store: true,
index: [{
field: "title",
// tokenize: "full",
// encoder: Charset.Default
}]
}
});
data.forEach(item => index.add(item));
const result = index.search({
query: 'en',
enrich: true,
highlight: { template: "<b>$1</b>" }
});
Actual Result:
[
{
"field": "title",
"result": [
{
"id": 2,
"doc": {
"id": 2,
"title": "en-US.json"
},
"highlight": "<b>e</b>n-US.json"
}
]
}
]
Expected Behavior:
When searching for "en", I expect the highlight to wrap the entire matched term:
"highlight": "<b>en</b>-US.json"
Troubleshooting Attempted:
- Adjusted
tokenizeoption (tried"full"and defaults) - Tested different
encodersettings (includingCharset.Default) - Verified with multiple term lengths (always highlights only first character)
The issue persists regardless of configuration. Could someone please advise if I'm missing something or if this is a potential bug? Any guidance would be greatly appreciated!
Hey @Wxh16144, I am hitting the same issue. I opened a bug report because I didn't realize you had this reported already: https://github.com/nextapps-de/flexsearch/issues/523.
@Wxh16144 Thanks a lot for your report. This issue happens, when a string gets a different string length after encoding. I didn't came up with a nice solution to this yet. You can overcome this issue when disable dedupe within Encoder Options:
const encoder = new Encoder({ dedupe: false });
Or when creating an Index:
const index = new Document({
document: {
index: [{
field: "title",
tokenize: "forward",
encoder: { dedupe: false }
}]
}
});
@ts-thomas I also am experiencing this even with dedupe set to false. Here is an example:
const FlexSearch = require("flexsearch");
const flexIndex = new FlexSearch.Document({
tokenize: "forward",
document: {
id: "id",
index: [{field: "content", tokenize: "strict", encoder: {dedupe: false}}],
store: ["content"],
},
});
flexIndex.add({
id: 1,
content: 'https://foo.com/example/path "EXAMPLE_PATH"',
});
console.log(
flexIndex.search("example", {
highlight: {
template: "<mark>$1</mark>",
},
})[0].result[0].highlight,
);
Actual output:
https://foo.co<mark>m/examp</mark>le/path <mark>"EXAMPLE</mark>_PATH"
Expected output:
https://foo.com/<mark>example</mark>/path "<mark>EXAMPLE</mark>_PATH"
For extra context, I am using the latest available version via npm 0.8.212.