docsify fix(search): clean markdown elements in search contents

Summary

Changes

Process the content before store it into indexDB, hence we don't need parse it every times. TODO In v5+: move it as a async job instead of handing main thread too long, especially with large contents. (it looks fine in our site for now)
Adaption to use marked v13+ with pure new renderer rewrite.
- Remove every markdown element stylings.
- Remove the helper ?> !> of docsify either.
Copied functions instead import to reduce the package size (import it will block the build min optimize since it is over 500kb).
Hardcode the ... to matched contents as truncation surroundings.
Test cases for the changes.

Snapshot (before -> after)

Screenshot1 Screenshot2

Related issue, if any:

What kind of change does this PR introduce?

Bugfix

For any code change,

[x] Related documentation has been updated, if needed
[x] Related tests have been added or updated, if needed

Does this PR introduce a breaking change?

Yes No

Tested in the following browsers:

[x] Chrome
[ ] Firefox
[ ] Safari
[ ] Edge

Jun 27 '24 14:06 Koooooo-7

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
docsify-preview	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 19, 2024 5:33am

Jun 27 '24 14:06 vercel[bot]

Thanks so much @Koooooo-7 for working on this! I've also tried the latest Preview in this thread, which I assume has this change included, and noticed two things:

The matching text snippet does not have ellipses (...) at the start of the text when truncated which can reduce the readers understanding of the truncation going on. For example, the fifth result returned when searching for class displayed erty 'classList' of null (#1527) (d6df2b8), closes...
The highlighting of the found text is on longer happening, which may be an understandable side effect of the merged v5 style updates etc.

I hope the above helps. Paul

Jul 19 '24 22:07 paulhibbitts

Hi @jhildenbiddle @paulhibbitts --- Thx for the points. I think the performance issue does need to resolve. I will sync with @sy-records after the #2464 storage layer change merged to do the new storage adaption and performance refine .

Jul 20 '24 08:07 Koooooo-7

This appears to be a result of storing search data unmodified, then doing a lot of text processing every a search query is performed. Why not do the processing while retrieving the search data and store the result so we only have to do basic text matches on search queries are performed?

Make sense. I think we could store the formatted data in storage and simple format the search content to do the retrieve instead of format it every time.

Jul 20 '24 08:07 Koooooo-7

@paulhibbitts

The matching text snippet does not have ellipses (...) at the start of the text when truncated which can reduce the readers understanding of the truncation going on. For example, the fifth result returned when searching for class displayed erty 'classList' of null (#1527) (d6df2b8), closes...

The highlighting of the found text is on longer happening, which may be an understandable side effect of the merged v5 style updates etc.

thx for the nice catch, notes the styling issue. 👌

Jul 20 '24 08:07 Koooooo-7

Empty ellipses (......) are being displayed when searching for items matching only a Header and no immediate content below. For example, search for "Headings" which is on the UI Kit page. If no content within ellipses perhaps do not display ellipses/content at all?

Nice catch! I didn't aware that there may have a empty search content, I will update it when there is empty content, no ... display.

---- Updated

Should we include Markdown image paths/names? For example, search for "icon.svg"?

For now, I keep the images path and names/titles meta for searching, although we can not see it in the content directly.

Aug 01 '24 15:08 Koooooo-7

Awesome @Koooooo-7 , looks good! Thank you very much 🙏🏼

Aug 01 '24 16:08 paulhibbitts

There seems to be a problem with diacritics.

Sep 18 '24 04:09 sy-records

There seems to be a problem with diacritics.

I checked the previews behavior, it is different from v4 result since last year. Which has a pure wrong result highlight for cafe.

Current behavior in this PR more looks like a "patch" to correct search contents, but we still need figure it out when and why the search content changed.

Update:

There is a potential issue for the postContent and handlePostContent, the handlePostContent may have large size than the postContent after being formatted (e.g " to &quot, size up to 5 times), which makes the substring in wrong place, will get a fix on it. Because we have the content format function changes, the behavior is still changed than v4+ .

Sep 18 '24 07:09 Koooooo-7

ping @sy-records

Sep 19 '24 05:09 Koooooo-7