fulltextsearch icon indicating copy to clipboard operation
fulltextsearch copied to clipboard

Fulltextsearch:live nor cron work for indexing content in Group Folders

Open fourstepper opened this issue 5 years ago • 3 comments

I have the same issue as https://github.com/nextcloud/fulltextsearch/issues/431

Plugin versions:

  - files_fulltextsearch: 2.0.0
  - fulltextsearch: 2.0.0
  - fulltextsearch_elasticsearch: 2.0.0

with a local ElasticSearch instance at version 7.9.2-1

If I take a file and copy it in a Group Folder to a different group folder, the new file location doesn't appear in search even after an hour - I tried using both cron and fulltextsearch:live for this, where:

System cron just doesnt't output any error, runs as expected every 5 minutes:

Nov  4 09:25:01 nc CROND[448822]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)
Nov  4 09:30:01 nc CROND[448864]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)
Nov  4 09:35:01 nc CROND[448924]: (apache) CMD (php -f /var/www/html/nextcloud/cron.php)

Fulltextsearch:live is stuck in Action: waiting from which is doesn't move

Here is my run of fulltextsearch:test, where everything seems to be working great

.Testing your current setup:
Creating mocked content provider. ok
Testing mocked provider: get indexable documents. (2 items) ok
Loading search platform. (Elasticsearch) ok
Testing search platform. ok
Locking process ok
Removing test. ok
Pausing 3 seconds 1 2 3 ok
Initializing index mapping. ok
Indexing generated documents. ok
Pausing 3 seconds 1 2 3 ok
Retreiving content from a big index (license). (size: 32386) ok
Comparing document with source. ok
Searching basic keywords:
 - 'test' (result: 1, expected: ["simple"]) ok
 - 'document is a simple test' (result: 2, expected: ["simple","license"]) ok
 - '"document is a test"' (result: 0, expected: []) ok
 - '"document is a simple test"' (result: 1, expected: ["simple"]) ok
 - 'document is a simple -test' (result: 1, expected: ["license"]) ok
 - 'document is a simple +test' (result: 1, expected: ["simple"]) ok
 - '-document is a simple test' (result: 0, expected: []) ok
 - 'document is a simple +test +testing' (result: 1, expected: ["simple"]) ok
 - 'document is a simple +test -testing' (result: 0, expected: []) ok
 - 'document is a +simple -test -testing' (result: 0, expected: []) ok
 - '+document is a simple -test -testing' (result: 1, expected: ["license"]) ok
 - 'document is a +simple -license +testing' (result: 1, expected: ["simple"]) ok
Updating documents access. ok
Pausing 3 seconds 1 2 3 ok
Searching with group access rights:
 - 'license' - [] -  (result: 0, expected: []) ok
 - 'license' - ["group_1"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_1","group_2"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_3","group_2"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_3"] -  (result: 0, expected: []) ok
Searching with share rights:
 - 'license' - notuser -  (result: 0, expected: []) ok
 - 'license' - user2 -  (result: 1, expected: ["license"]) ok
 - 'license' - user3 -  (result: 1, expected: ["license"]) ok
Removing test. ok
Unlocking process ok

fourstepper avatar Nov 04 '20 08:11 fourstepper

This is the status for hours when running fulltextsearch:live

sudo -u apache /usr/bin/php /var/www/html/nextcloud/occ fulltextsearch:live

Memory: 10 MB
┌─ Indexing  ────
│ Action: waiting
│ Provider:                      Account:
│ Document:
│ Info:
│ Title:
│ Content size:
└──
┌─ Results ────
│ Result:      0/0
│ Index:
│ Status:
│ Message:
│
│
└──
┌─ Errors ────
│ Error:      7/7
│ Index: files:8732
│ Exception: Elasticsearch\Common\Exceptions\BadRequest400Exception
│ Message: field [content] not present as part of path [attachment.content]
│
│
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause

Adding a file or editing it with OnlyOffice doesn't change anything - the file also doesn't get into the index in case of just letting the nextcloud cron run

Right now the only way to index files for us is to run fulltextsearch:index every 30 minutes from the system cron, which is quite inefficient

fourstepper avatar Nov 12 '20 09:11 fourstepper

Furthermore, it seems that Group Folders are indexed multiple times - this wouldn't be much of a problem and it would even make sense if cron and/or :live worked fine, but they unfortunately don't - this means that the performance hit of the above method is really big and that indexing takes a lot of time

fourstepper avatar Nov 12 '20 09:11 fourstepper

Would it be possible to add a special index function that scans only for GroupFolders? And when a user perform a request, only shows the results to which he has access to.

Zegorax avatar Nov 24 '20 10:11 Zegorax