[BETA] Import files and metadata from Paperless
Import script in https://github.com/eikek/docspell/blob/master/tools/import-paperless/
Data transferred:
- files
- metadata
- title
- tags
- correspondent as organization
- creation date
Please be aware that existing information of existing documents will be overwritten. So when you already have a document in Docspell which is also in Paperless, it won't be added again (by checksum) but the attributes (title, tags, ...) will be overwritten!
After using Paperless for quite a while, I figured out that there is some room for improvement but only little work still done on the project, what is totally fine as it is a private and open-source project!
Still, I came around Docspell and found it to have quite a potential, especially regarding the AI and AI-like features growing. Still I wanted to transfer the tagging and structure from Paperless to Docspell and not only import the files and start over the managing process once again.
That is why I put in my dirty bash scripting skills and made a script, which reads the files from the internal documents folder of Paperless and extracts tags and correspondents from Paperless and imports them to Docspell using the official API, so no dirty DB writes or something like that!
Please, everybody who also comes from Paperless, try out this script! If in need of help, just ask here or on Gitter. And if you have suggestions for improvement tell me or even better make a pull request :-)
I will leave this ticket for discussion open
List of improvements:
- [x] Transfer title and creation date of document
- [ ] Parameter for skipping existing documents, so that no metadata is updated for them
- [ ] Log (non fatal) errors, maybe warnings, and state them in the end
- [ ] Provide as Docker image, to make sure dependencies are all fulfilled
- [ ] Exec directly into the Paperless container to be path independent
So, for those of us using paperless with postgres, Is there a way to also do this?
So, for those of us using paperless with postgres, Is there a way to also do this?
I can't help with that, I'm afraid. The script was written by totti4ever a long time ago and he left this project since a while. The db schema is the same, so you can probably just exchange the sqlite calls with appropriate psql calls in the script.
Also, looking it #1241 there seems to be some problems with this script right now.