Fulltextsearch:live nor cron work for indexing content in Group Folders
I have the same issue as https://github.com/nextcloud/fulltextsearch/issues/431
Plugin versions:
- files_fulltextsearch: 2.0.0
- fulltextsearch: 2.0.0
- fulltextsearch_elasticsearch: 2.0.0
with a local ElasticSearch instance at version 7.9.2-1
If I take a file and copy it in a Group Folder to a different group folder, the new file location doesn't appear in search even after an hour - I tried using both cron and fulltextsearch:live for this, where:
System cron just doesnt't output any error, runs as expected every 5 minutes:
Nov 4 09:25:01 nc CROND[448822]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)
Nov 4 09:30:01 nc CROND[448864]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)
Nov 4 09:35:01 nc CROND[448924]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)
Fulltextsearch:live is stuck in Action: waiting from which is doesn't move
Here is my run of fulltextsearch:test, where everything seems to be working great
.Testing your current setup:
Creating mocked content provider. ok
Testing mocked provider: get indexable documents. (2 items) ok
Loading search platform. (Elasticsearch) ok
Testing search platform. ok
Locking process ok
Removing test. ok
Pausing 3 seconds 1 2 3 ok
Initializing index mapping. ok
Indexing generated documents. ok
Pausing 3 seconds 1 2 3 ok
Retreiving content from a big index (license). (size: 32386) ok
Comparing document with source. ok
Searching basic keywords:
- 'test' (result: 1, expected: ["simple"]) ok
- 'document is a simple test' (result: 2, expected: ["simple","license"]) ok
- '"document is a test"' (result: 0, expected: []) ok
- '"document is a simple test"' (result: 1, expected: ["simple"]) ok
- 'document is a simple -test' (result: 1, expected: ["license"]) ok
- 'document is a simple +test' (result: 1, expected: ["simple"]) ok
- '-document is a simple test' (result: 0, expected: []) ok
- 'document is a simple +test +testing' (result: 1, expected: ["simple"]) ok
- 'document is a simple +test -testing' (result: 0, expected: []) ok
- 'document is a +simple -test -testing' (result: 0, expected: []) ok
- '+document is a simple -test -testing' (result: 1, expected: ["license"]) ok
- 'document is a +simple -license +testing' (result: 1, expected: ["simple"]) ok
Updating documents access. ok
Pausing 3 seconds 1 2 3 ok
Searching with group access rights:
- 'license' - [] - (result: 0, expected: []) ok
- 'license' - ["group_1"] - (result: 1, expected: ["license"]) ok
- 'license' - ["group_1","group_2"] - (result: 1, expected: ["license"]) ok
- 'license' - ["group_3","group_2"] - (result: 1, expected: ["license"]) ok
- 'license' - ["group_3"] - (result: 0, expected: []) ok
Searching with share rights:
- 'license' - notuser - (result: 0, expected: []) ok
- 'license' - user2 - (result: 1, expected: ["license"]) ok
- 'license' - user3 - (result: 1, expected: ["license"]) ok
Removing test. ok
Unlocking process ok
This is the status for hours when running fulltextsearch:live
sudo -u apache /usr/bin/php /var/www/html/nextcloud/occ fulltextsearch:live
Memory: 10 MB
┌─ Indexing ────
│ Action: waiting
│ Provider: Account:
│ Document:
│ Info:
│ Title:
│ Content size:
└──
┌─ Results ────
│ Result: 0/0
│ Index:
│ Status:
│ Message:
│
│
└──
┌─ Errors ────
│ Error: 7/7
│ Index: files:8732
│ Exception: Elasticsearch\Common\Exceptions\BadRequest400Exception
│ Message: field [content] not present as part of path [attachment.content]
│
│
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause
Adding a file or editing it with OnlyOffice doesn't change anything - the file also doesn't get into the index in case of just letting the nextcloud cron run
Right now the only way to index files for us is to run fulltextsearch:index every 30 minutes from the system cron, which is quite inefficient
Furthermore, it seems that Group Folders are indexed multiple times - this wouldn't be much of a problem and it would even make sense if cron and/or :live worked fine, but they unfortunately don't - this means that the performance hit of the above method is really big and that indexing takes a lot of time
Would it be possible to add a special index function that scans only for GroupFolders? And when a user perform a request, only shows the results to which he has access to.