flexsearch icon indicating copy to clipboard operation
flexsearch copied to clipboard

highlighted results are misleading

Open Wxh16144 opened this issue 9 months ago • 3 comments

First, thank you for this library and its rich API. I've spent several days carefully studying the documentation and have implemented most of my desired functionality. However, I'm encountering a confusing highlighting result that I can't resolve.

Minimal Reproduction:

import { Document, Charset } from "https://cdn.jsdelivr.net/gh/nextapps-de/[email protected]/dist/flexsearch.compact.module.min.js";

const data = [
  { "id": 1, "title": "Carmencita" },
  { "id": 2, "title": "en-US.json" }
];

const index = new Document({
  document: {
    store: true,
    index: [{
      field: "title",
      // tokenize: "full",
      // encoder: Charset.Default
    }]
  }
});

data.forEach(item => index.add(item));

const result = index.search({
  query: 'en',
  enrich: true,
  highlight: { template: "<b>$1</b>" }
});

Actual Result:

[
  {
    "field": "title",
    "result": [
      {
        "id": 2,
        "doc": {
          "id": 2,
          "title": "en-US.json"
        },
        "highlight": "<b>e</b>n-US.json"
      }
    ]
  }
]

Expected Behavior:
When searching for "en", I expect the highlight to wrap the entire matched term:
"highlight": "<b>en</b>-US.json"

Troubleshooting Attempted:

  • Adjusted tokenize option (tried "full" and defaults)
  • Tested different encoder settings (including Charset.Default)
  • Verified with multiple term lengths (always highlights only first character)

The issue persists regardless of configuration. Could someone please advise if I'm missing something or if this is a potential bug? Any guidance would be greatly appreciated!

Wxh16144 avatar Jul 03 '25 07:07 Wxh16144

Hey @Wxh16144, I am hitting the same issue. I opened a bug report because I didn't realize you had this reported already: https://github.com/nextapps-de/flexsearch/issues/523.

stanislaw avatar Aug 17 '25 07:08 stanislaw

@Wxh16144 Thanks a lot for your report. This issue happens, when a string gets a different string length after encoding. I didn't came up with a nice solution to this yet. You can overcome this issue when disable dedupe within Encoder Options:

const encoder = new Encoder({ dedupe: false });

Or when creating an Index:

const index = new Document({
    document: {
        index: [{
            field: "title",
            tokenize: "forward",
            encoder: { dedupe: false }
        }]
    }
});

ts-thomas avatar Sep 07 '25 09:09 ts-thomas

@ts-thomas I also am experiencing this even with dedupe set to false. Here is an example:

const FlexSearch = require("flexsearch");

const flexIndex = new FlexSearch.Document({
    tokenize: "forward",
    document: {
        id: "id",
        index: [{field: "content", tokenize: "strict", encoder: {dedupe: false}}],
        store: ["content"],
    },
});

flexIndex.add({
    id: 1,
    content: 'https://foo.com/example/path "EXAMPLE_PATH"',
});

console.log(
    flexIndex.search("example", {
        highlight: {
            template: "<mark>$1</mark>",
        },
    })[0].result[0].highlight,
);

Actual output:

https://foo.co<mark>m/examp</mark>le/path <mark>"EXAMPLE</mark>_PATH"

Expected output:

https://foo.com/<mark>example</mark>/path "<mark>EXAMPLE</mark>_PATH"

For extra context, I am using the latest available version via npm 0.8.212.

TJSolecki avatar Nov 12 '25 18:11 TJSolecki