Support Docx via PDF conversion
Docx documents can easily be converted (or printed) to PDF.
The advantage of this process is that the printing process generates proper layout and visualization components as pages, bounding boxes, etc. The disadvantage is that the process is a bit slower than reading the native Docx format, and that all the semantic content must be re-inferred (e.g. section headers, etc)
Reading Docx via PDF conversion will be one of the possible ways of using Docx document as input. See #105 for the native fast parsing.
Docx documents can easily be converted (or printed) to PDF.
To my knowledge, there are pretty much two ways to convert docx to pdf. Using either something like docx2pdf which uses installed MS Word application to do the conversion, this will not work eg. on linux servers. Another way to do this is using headless LibreOffice but this is also quite heavy dependency. Just out of curiosity, is there some better solution to this than my given examples?