Docx-to-HTML icon indicating copy to clipboard operation
Docx-to-HTML copied to clipboard

Images not included

Open KillerSquid opened this issue 8 years ago • 2 comments

I am attempting to use this class to convert an uploaded docx file to HTML on the fly for a web app I'm working on. The HTML conversion is working fine, although the output is a little clunky.

I tried importing a Word file that has embedded images (almost all of the docx files that will be used by the app have images in them) and they don't show up in the outputted HTML.

I've looked through the code for the class and it doesn't seem that there's a case for images and for the life of me I can't write one in that works. Any thoughts?

KillerSquid avatar Mar 23 '17 16:03 KillerSquid

I think I originally tried to get it to pull inline images in, but was under time constraints and couldn't figure out a good way to do it. Maybe snoop around in the .docx file (it's just a regular old zip file) and see if you can find image binaries in there and correlate whatever tag they're using in the .xml to it? It's also possible that the binary code for the image is in the document itself, in which case you might need to do a little reading on what format MS uses to encode the images so you can pull it out and save it as a separate file.

xylude avatar Mar 24 '17 12:03 xylude

Thanks. I'll give it a try and see what I can figure out. I'll post on here if I find anything useful.

KillerSquid avatar Mar 24 '17 14:03 KillerSquid