docfx icon indicating copy to clipboard operation
docfx copied to clipboard

[Bug] Search Tokenizer Config not correctly applied from lunr.tokenizer.separator config

Open s0ar opened this issue 2 years ago • 0 comments

According to the fix (https://github.com/dotnet/docfx/pull/5083) and the code in default template and modern template the lunr.tokenizer.separator should be whitespace, dash, dot and round brackets. The config for default template looks ok, the config for modern template is not escaped: lunr.tokenizer.separator = /[\s-.()]+/

default template: https://github.com/dotnet/docfx/blob/bd56627e35cd5d0194e115c3e09835b93e84f9a1/templates/default/src/search-worker.js#L9

modern template: https://github.com/dotnet/docfx/blob/bd56627e35cd5d0194e115c3e09835b93e84f9a1/templates/modern/src/search-worker.ts#L46

Furthermore, when you build the project there will : _site/public/search-worker.min.js which contains t.tokenizer.separator=/[\s\-]+/ and w.default.tokenizer.separator=/[\s\-.()]+/

i think both are not correct and the regex should be in both cases: /[\s\-\.\(\)]+/

But when you modify the _site/public/search-worker.min.js and fix the first usage to t.tokenizer.separator=/[\s\-\.\(\)]+/ search terms with dots will now work.

To Reproduce Steps to reproduce the behavior:

  1. run docfx init
  2. _enableSearch should be set to true
  3. open docs/getting-started.md and add some text like: filename.jpg
  4. run docfx build and serve the page
  5. if you search for "filename" it will find your text, but not if you type "filename.jpg".
  6. examine the generated search-worker.min.js file and check the regex pattern.

docfx-separator-withdot docfx-separator

Expected behavior When entering filename.jpg in the search form field it should tokenize at the dot. The generated _site/public/search-worker.min.js file should define the pattern as t.tokenizer.separator=/[\s\-\.\(\)]+/ and fixing the default by escaping the characters in the regex: w.default.tokenizer.separator=/[\s\-\.\(\)]+/ After that change and clearing the browser cache a search for "filname.jpg" matches the text.

Context (please complete the following information):

  • OS: Windows
  • Docfx version: 2.75.2+fe673ecea2ac444a4fd480e6cfcf605c78614385

Additional context Add any other context about the problem here.

s0ar avatar Feb 05 '24 15:02 s0ar