docconv
docconv copied to clipboard
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
Hello, Anyone could help me? how to use this lib on a Windows machine? because it needs to install the dependencies. Any tutorial? thank you
That are the defaults for docd, but that doesn't apply to library usage generating the problem seen in the issue #78. This PR can have a drawback if you intentionally...
`.tif` files should map to the `image/tiff` mime type. List of official mime types: https://www.iana.org/assignments/media-types/media-types.xhtml From MDN web docs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types
I use this as a library and want to be able to send all output to my own logger, so I propose this change
Line 102 103 in this doc. go causes a system deadlock, mainly because the coroutine implemented above failed to add valid data to the channel ``` body :=
Hi, Sometimes, the case in Content-Types.xml and zipped file names do not match. Maybe not an issue on Windows, but is is an issue, when such file is searched in...
So I was trying to parse content from multiple document formats and turns out it works for other document formats `pdf`, `doc` etc. but not for html files somehow below...
https://github.com/gabriel-vasile/mimetype
Hi, I was having issues when trying to build the code on MacOS targeting Linux, so I created a script to build the code on the Docker image instead, with...
Hi, when xml is not encoded in utf-8, decoder requires charset reader. Credits: https://stackoverflow.com/questions/6002619/unmarshal-an-iso-8859-1-xml-input-in-go/32224438#32224438