BookStack OCR for images pasted in pages

Describe the feature you'd like

It would be awesome if BookStack could perform some basic OCR on any images pasted into pages and store the text in the database for improving search results, similar to how the HTML is converted to just text and stored for searching.

Describe the benefits this would bring to existing BookStack users

This would improve search results, especially when a lot of our documentation contains screenshots whose text otherwise doesn't appear anywhere in the page HTML content itself.

Can the goal of this request already be achieved via other means?

Aside from being meticulous with organization and things like tagging, not really. You could manually go through the effort of doing OCR on the images but there's nowhere to put the text for the search engine to see it other than in the page content itself.

Have you searched for an existing open/closed issue?

[X] I have searched for existing issues and none cover my fundamental request

How long have you been using BookStack?

5+ years

Additional context

This is similar to #3767 but different in that I'm not looking to index attachments. This would index content that is already part of the page itself and humans would have some expectation of being able to search for.

Oct 01 '24 19:10 c0shea

I think paperless-ng would be what you are looking for, if you haven't tried it :)

Oct 02 '24 07:10 kazyka

This would be kind of handy, because I sometimes upload photographs of pages of books as references to my wiki.

https://github.com/naptha/tesseract.js would be an ideal way of implementing it.

Oct 02 '24 18:10 virtadpt