MS office document updates written to NC from OnlyOffice are not indexed correctly
Hi, I'd like to build a setup with Nextcloud + OnlyOffice + ES to have full text search and collaborative editing.
My basic full text search setup works. I've set up ES 7.6 and the ingestion plug-in as documented in your wiki. I'm using the "documentserver_community" app for OnlyOffice (which saves me from having to install OnlyOffice separately). If I run occ fulltextsearch:index for the first time, the content of binary office documents is indexed correctly.
However, let's say I update a docx file content in the OnlyOffice editor in the browser. The changes are then written back to the file by OnlyOffice. This should normally happen automatically, but doesn't - see this known bug. The workaround is to run occ documentserver:flush as cronjob. This causes the file to be replaced. Weirdly, it's not my own user, but the activity tab of the Files app lists this: Changed by "remote user" - whoever that is...
In any case, if I then run occ fulltextsearch:index I see output such as this:
┌─ Indexing ────
│ Action: compareWithCurrentIndex
│ Provider: Files Account: myuseraccount
│ Document: 219
│ Info: application/vnd.openxmlformats-officedocument.wordprocessingml.document
│ Title: First test.docx
│ Content size: 0
│ Chunk: 3/3
│ Progress: all/1
└──
┌─ Results ────
│ Result: 1/1
│ Index: files:219
│ Status: ok
│ Message: {"_index":"fts","_type":"standard","_id":"files:219","_version":5,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":6,"_primary_term":1}
│
│
└──
┌─ Errors ────
│ Error: 0/0
│ Index:
│ Exception:
│ Message:
│
│
└──
Notice the Content size: 0 part. When I then look at the ES documents, I find that the ES document content key is now an empty string.
However, if I download the docx file, change its content in a local word processor, and upload it again (dropping it in the browser in the Files app), replacing the existing docx file, and then run occ fulltextsearch:index again, then everything works as expected. The output is as above, but Content size is larger than 0, and searching for terms in the document works, too.
My current workaround is to run the commands occ fulltextsearch:reset followed by occ fulltextsearch:index which is of course not very efficient.
I've just faced with the same issue. If I upload the office document or change it using MS office on the external device everything is good. No problem appeared.
But after editing it by OnlyOffice the document complitly disappering out of the index. And don't appears again even after occ fulltextsearch:index
The only way is to apply occ fulltextsearch:reset and occ fulltextsearch:index again.
Same behaviour for me. Also if I create a document with OnlyOffice it doesn't seems to be indexed.
Could it make difference if I use Nextcloud integrated OnlyOffice in NC18+ or if I use a external document server?
have you installed the attachment plugins for elasticsearch ?
https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html
This issue still seems to be outstanding. Anyone find a good way to fix/work around it?