workflow_ocr icon indicating copy to clipboard operation
workflow_ocr copied to clipboard

Trigger OCR if file was created or updated is not working

Open lodzen opened this issue 2 years ago • 7 comments

Describe the bug

Trigger OCR if file was created or updated is not working

System

  • App version: 1.28
  • Nextcloud version: 28.0.1
  • PHP version: 8.2.13
  • Environment: [Docker/native Apache/native PHP FPM/...] Linux 6.1.0-rpi7-rpi-2712 aarch64 Docker Compose from Linuxserver
  • ocrmypdf version: 14.1.0-r1

How to reproduce

Steps to reproduce the behavior: Configure the Workflow as mentioned in the manual The OCR is not triggered by Nextcloud if a file was added or modified.

Screenshots

Conversion is only working for Tags Reason why the ocr Tag is added multiple times is because the automated tagging workflow is used on top image

Server log

Please paste relevant content of your nextcloud.log file here. It might make sense to first decrease the Loglevel. Also, since the OCR process runs asynchronously, run your cron.php before copying the logs here.

Nothing visible in the server log files

lodzen avatar Jan 19 '24 09:01 lodzen

Thanks for reporting this. Unfortunately I cannot reproduce the issue. Here is what I did:

  1. Create a fresh NC 28 Docker instance
  2. Install the Workflow OCR app from the Appstore
  3. Install ocrmypdf inside of the container
  4. Configure a personal flow like this (to ensure it will always be processed via OCR): image
  5. Upload the following test file: ocr-test.pdf via NC UI
  6. Trigger the NC cron sudo -u www-data php cron.php

Result: new file version is created as expected, text is markable inside of the document

image

Please use our troubleshooting guide and repeat your process. If you decreased your logging level like described, there must be some server logs. Those are mandatory for us to understand the problem.

Thanks for your help

R0Wi avatar Jan 23 '24 08:01 R0Wi

Hello,

i setup the flow exactly as in your screenshot: image

The cron is configured to run all 5min: image

Test pdf: image

Even after 15 min file was not analyzed: image

image

The difference is that its not a personal flow at my end its a global one

lodzen avatar Jan 23 '24 13:01 lodzen

Ok but even with a personal flow the Job is not executed

lodzen avatar Jan 23 '24 13:01 lodzen

Your frontend configuration looks correct. Nevertheless, without additional backend logs it will be impossible to find the error. Like described here, please decrease your NC loglevel, repeat the process (don't forget to execute the cron manually) and post your logs here.

R0Wi avatar Jan 23 '24 19:01 R0Wi

I created the logs now and tried to prefilter it as best as possible flow.log nextcloud.log

lodzen avatar Jan 25 '24 09:01 lodzen

There are two interesting lines in your nextcloud.log, one is logged by the workflowengine itself and the other is logged by this app (workflow_ocr):

{"reqId":"jiLiFxRBg9xJAAFAsV97","level":0,"time":"2024-01-25T09:32:59+00:00","remoteAddr":"79.249.68.60","user":"daniel","app":"workflowengine","method":"GET","url":"/core/preview?fileId=11909&x=250&y=250","message":"No flow configurations is going to run OCR-Datei","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","version":"28.0.1.1","data":{"app":"workflowengine","level":"0"},"id":196}

{"reqId":"jiLiFxRBg9xJAAFAsV97","level":0,"time":"2024-01-25T09:32:59+00:00","remoteAddr":"79.249.68.60","user":"daniel","app":"workflow_ocr","method":"GET","url":"/core/preview?fileId=11909&x=250&y=250","message":"Not processing event because IRuleMatcher->getFlows did not return anything","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","version":"28.0.1.1","data":{"app":"workflow_ocr"},"id":197}

I think the interesting bit here is No flow configurations is going to run OCR-Datei, which tells us that there might be some misconfiguration in your workflow. The second line tells us basically the same Not processing event because IRuleMatcher->getFlows did not return anything.

image

At the moment I have no idea why it behaves like this for you but it doesn't seem to be a general problem with Nextcloud 28 since I can't reproduce the problem. I think further investigation is needed here.

If you setup a workflow with the same conditions (file created/updated, mimetype is PDF) and you use the "Workflow Tagging", does this one work? So will it tag your PDF files correctly?

R0Wi avatar Jan 30 '24 09:01 R0Wi

Some technical details:

  • The "Flow activation: rules were requested for operation OCR-Datei" log messages is written here
  • The "No flow configurations is going to run OCR-Datei" message is written here

Both log messages are produced by the workflowengine of Nextcloud, which contains the core-logic for workflow apps. In this case the getFlows method is called by our workflow_ocr app and the core logic of Nextcloud tells the app to "not run".

R0Wi avatar Jan 30 '24 09:01 R0Wi

Seems to be an NC core related issue. Feel free to raise this here. Closed due to lack of feedback.

R0Wi avatar Jun 02 '24 13:06 R0Wi