Teedy importer fails to upload documents - 400 error
I tried running the latest release that had binaries from both Windows and linux and got the same results.
All documents fail to upload with a -null message.
Enabling NODE_DEBUG=request shows me that the PUT returns a 400 for each document. I tried accessing my instance both from its public url (via cloudflare) but also from its local ip on the host machine in case it was cloudflare that was causing me issues:
✖ Upload failed for /home/savvasdalkitsis/Papers/Papers/Financial/Bank/Lloyds/Statement 5 years till 01-2016/Statement 2.pdf: null
⠋ Importing: /home/savvasdalkitsis/Papers/Papers/Financial/Bank/Lloyds/Statement 5 years till 01-2016/Statement 3.pdfREQUEST {
jar: true,
url: 'http://172.18.0.41:8080/api/tag/list',
callback: [Function],
method: 'GET'
}
REQUEST make request http://172.18.0.41:8080/api/tag/list
REQUEST onRequestResponse http://172.18.0.41:8080/api/tag/list 200 {
connection: 'close',
'cache-control': 'no-cache',
expires: '0',
'content-type': 'application/json',
'content-length': '350',
server: 'Jetty(9.4.36.v20210114)'
}
REQUEST reading response's body
REQUEST finish init function http://172.18.0.41:8080/api/tag/list
REQUEST response end http://172.18.0.41:8080/api/tag/list 200 {
connection: 'close',
'cache-control': 'no-cache',
expires: '0',
'content-type': 'application/json',
'content-length': '350',
server: 'Jetty(9.4.36.v20210114)'
}
REQUEST end event http://172.18.0.41:8080/api/tag/list
REQUEST has body http://172.18.0.41:8080/api/tag/list 350
REQUEST emitting complete http://172.18.0.41:8080/api/tag/list
REQUEST {
jar: true,
url: 'http://172.18.0.41:8080/api/document',
form: 'title=Statement%203.pdf&language=eng&tags=',
callback: [Function],
method: 'PUT'
}
REQUEST make request http://172.18.0.41:8080/api/document
REQUEST onRequestResponse http://172.18.0.41:8080/api/document 400 {
connection: 'close',
'cache-control': 'no-cache',
expires: '0',
'content-type': 'application/json',
'content-length': '50',
server: 'Jetty(9.4.36.v20210114)'
}
More info. Checking out the project locally and running it from code, after modifying this line:
https://github.com/sismics/docs/blob/d51dfd6636ba676a8deb9e8d23bd4ce9667e3c7a/docs-importer/main.js#L454
To instead of printing the error to print response.body I now see these messages:
Upload failed for FILE : {"type":"TagNotFound","message":"Tag not found: "}
It looks like not selecting tags for the documents causes the upload to fail?
I can verify that selecting a label to apply to all docs the upload completes
@savvasdalkitsis Thanks for looking into the code, if you have time to fix #619 and #602 that would be awesome! (both related to the importer).
Hm this issue seems to be with the backend not accepting no tags when adding files. I can take a stab at fixing this but I don't have a wide picture of the backend app and not sure if that would break anything.
@savvasdalkitsis It might be because the importer is passing an empty string in the parameter tags and the backend interprets it as a list on 1 tag with no ID (which of course doesn't exist and leads to "TagNotFound").
We would need to inspect what is sent to the API exactly.
Would something like this work? I cannot locally run the project for some reason (need to spend some time on that) so i am not sure what type of integration testing (if any) is in place and if i broke anything. I guess the PR will tell us
It might be interesting to reproduce the issue (and confirm that it's fixed) by adding a test around here https://github.com/sismics/docs/blob/master/docs-web/src/test/java/com/sismics/docs/rest/TestDocumentResource.java#L64
That's the test I had problems running. It requires external dependencies to be installed and I didnt have a lot of time to investigate. Will do that once I have some spare time
Yes you are right it requires those dependencies: https://github.com/sismics/docs/blob/master/.github/workflows/build-deploy.yml#L22
ffmpeg mediainfo tesseract-ocr tesseract-ocr-deu