eec icon indicating copy to clipboard operation
eec copied to clipboard

从Excel中读取的图片是否可以区分是不是隐藏对象?

Open RexH0 opened this issue 1 year ago • 13 comments

List<Drawings.Picture> pictures = reader.listPictures(); 读取出来的图片有没有办法区分是不是隐藏对象?

RexH0 avatar Jan 06 '25 08:01 RexH0

如何隐藏图片,可以给一个操作手册我本地测试一下才能给出方案

wangguanquan avatar Jan 06 '25 09:01 wangguanquan

文件已发送到您邮箱 导入此文件,使用 List<Drawings.Picture> pictures = reader.listPictures(); 读取文件时 程序直接卡死 不会再往下走, 不知道excel 有什么问题,用wps打开也很慢,客户做的excel,有没有办法可以提示文件有问题?

RexH0 avatar Jan 08 '25 06:01 RexH0

image 卡住是因为读到十几万个图片,这些图片在excel 内看不到

RexH0 avatar Jan 08 '25 06:01 RexH0

目前读图片会一次性解析到list保存到内存,几十万图片会造成内存不足触发频繁GC导致CPU飙高服务器GG

wangguanquan avatar Jan 08 '25 07:01 wangguanquan

这个表格应该是有问题的,这些图片在表格里我也没看到在哪里,程序有没有办法把它找出来 ?

RexH0 avatar Jan 08 '25 07:01 RexH0

晚上看一下原文件再回复你

wangguanquan avatar Jan 08 '25 08:01 wangguanquan

这个文件我本地出不来,使用EEC发现drawing.xml有大量重复的节点,这个文件有近90M而EEC读取图片使用dom4j普通模式读取缓慢所以造成假死,图片显示不出来是因为被隐藏了

wangguanquan avatar Jan 08 '25 10:01 wangguanquan

那是否有办法可以判断xml文件中有重复节点?

RexH0 avatar Jan 08 '25 13:01 RexH0

使用如下方法可以极大提升读取速度

  1. 自定义XMLDrawings并覆写parseDrawings方法,处理“隐藏”图片和重复节点
  2. 自定义ExcelReader并覆写init方法引用第一步自定义XMLDrawings
  3. 使用自定义ExcelReader
// 自定义XMLDrawings
public static class MyXMLDrawings extends XMLDrawings {

    public MyXMLDrawings(ExcelReader reader) {
        super(reader);
    }

    // Parse drawings.xml
    protected List<Picture> parseDrawings(ZipFile zipFile, ZipEntry entry, Path imagesPath) {
        int i = entry.getName().lastIndexOf('/');
        String relsKey;
        if (i > 0)
            relsKey = entry.getName().substring(0, i) + "/_rels" + entry.getName().substring(i);
        else if ((i = entry.getName().lastIndexOf('\\')) > 0)
            relsKey = entry.getName().substring(0, i) + "\\_rels" + entry.getName().substring(i);
        else relsKey = entry.getName();
        String key = relsKey + ".rels";
        ZipEntry entry1 = getEntry(zipFile, key);
        if (entry1 == null) return null; //throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
        SAXReader reader = SAXReader.createDefault();
        Document document;
        try {
            document = reader.read(zipFile.getInputStream(entry1));
        } catch (DocumentException | IOException e) {
            throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
        }
        List<Element> list = document.getRootElement().elements();
        Relationship[] rels = new Relationship[list.size()];
        i = 0;
        for (Element e : list) {
            rels[i++] = new Relationship(e.attributeValue("Id"), e.attributeValue("Target"), e.attributeValue("Type"));
        }
        RelManager relManager = RelManager.of(rels);

        try {
            document = reader.read(zipFile.getInputStream(entry));
        } catch (DocumentException | IOException e) {
            throw new ExcelReadException("The file format is incorrect or corrupted. [" + entry.getName() + "]");
        }

        Element root = document.getRootElement();
        Namespace xdr = root.getNamespaceForPrefix("xdr"), a = root.getNamespaceForPrefix("a");

        List<Element> elements = root.elements();
        List<Picture> pictures = new ArrayList<>(elements.size());
        // 处理大量重复的节点
        Map<String, Path> localPathMap = new HashMap<>(Math.min(1 << 10, elements.size()));
        for (Element e : root.elements()) {
            Element pic = e.element(QName.get("pic", xdr));
            // Not a picture
            if (pic == null) continue;

            Element blipFill = pic.element(QName.get("blipFill", xdr));
            if (blipFill == null) continue;

            Element blip = blipFill.element(QName.get("blip", a));
            if (blip == null) continue;

            // FIXME 判断是否隐藏,如果为"隐藏"图片需要业务判断是否需要读取
            Element nvPicPr = pic.element(QName.get("nvPicPr", xdr));
            if (nvPicPr != null) {
                Element cNvPr = nvPicPr.element(QName.get("cNvPr", xdr));
                // 隐藏图片默认不读取
                if (cNvPr != null && "1".equals(cNvPr.attributeValue("hidden"))) {
                    continue;
                }
            }

            Namespace r = blip.getNamespaceForPrefix("r");
            String embed = blip.attributeValue(QName.get("embed", r));
            Relationship rel = relManager.getById(embed);
            if (rel != null && Const.Relationship.IMAGE.equals(rel.getType())) {
                Picture picture = new Picture();
                pictures.add(picture);
                // Copy image to tmp path
                String target = toZipPath(rel.getTarget());
                // FIXME 修改点:先从缓存里查看是否已解析过图片,如果有则直接从缓存中获取
                Path targetPath = localPathMap.get(target);
                if (targetPath == null && (entry = getEntry(zipFile, "xl/" + target)) != null) {
                    // Copy image to tmp path
                    try {
                        targetPath = imagesPath.resolve(rel.getTarget());
                        Files.copy(zipFile.getInputStream(entry), targetPath, StandardCopyOption.REPLACE_EXISTING);
                        localPathMap.put(target, targetPath);
                    } catch (IOException ioException) { }
                }
                picture.localPath = targetPath;

                int[][] ft = parseDimension(e, xdr);
                picture.dimension = new Dimension(ft[0][2] + 1, (short) (ft[0][0] + 1), ft[1][2] + 1, (short) (ft[1][0] + 1));
                picture.padding = new short[] { (short) ft[0][3], (short) ft[1][1], (short) ft[1][3], (short) ft[0][1] };
                String editAs = e.attributeValue("editAs");
                int property = -1;
                if (StringUtil.isNotEmpty(editAs)) {
                    switch (editAs) {
                        case "twoCell" : property = 0; break;
                        case "oneCell" : property = 1; break;
                        case "absolute": property = 2; break;
                        default:
                    }
                }
                picture.property = property;
                Element spPr = pic.element(QName.get("spPr", xdr));
                if (spPr != null) {
                    Element xfrm = spPr.element(QName.get("xfrm", a));
                    String rot;
                    if (xfrm != null && StringUtil.isNotBlank(rot = xfrm.attributeValue("rot"))) {
                        try {
                            picture.revolve = Integer.parseInt(rot) / 60000;
                        } catch (Exception ex) {
                            // Ignore
                        }
                    }

                    // TODO Attach picture effects
                }

                Element extLst = blip.element(QName.get("extLst", a));
                if (extLst == null) continue;

                for (Element ext : extLst.elements()) {
                    Element srcUrl = ext.element("picAttrSrcUrl");
                    // hyperlink
                    if (srcUrl != null) {
                        rel = relManager.getById(srcUrl.attributeValue(QName.get("id", r)));
                        if (rel != null && Const.Relationship.HYPERLINK.equals(rel.getType())) {
                            picture.srcUrl = rel.getTarget();
                        }
                    }
                }
            }
        }
        return !pictures.isEmpty() ? pictures : null;
    }
}
// 自定义ExcelReader
public static class MyExcelReader extends ExcelReader {
    public MyExcelReader(Path path) throws IOException {
        super(path);
    }

    public MyExcelReader(InputStream stream) throws IOException {
        super(stream);
    }

    @Override
    protected ExcelReader init(Path path) throws IOException {
        super.init(path);
        // FIXME 使用MyXMLDrawings
        if (drawings != null) {
            drawings = new MyXMLDrawings(this);
            for (Sheet sheet : sheets) {
                ((XMLSheet) sheet).setDrawings(drawings);
            }
        }
        return this;
    }
}
// 使用自定义ExcelReader
@Test public void testMyExcelReader() throws IOException {
    // FIXME 使用自定义MyExcelReader
    try (ExcelReader reader = new MyExcelReader(Paths.get("F:/excel/1836921628.xlsx"))) {
        List<Drawings.Picture> list = reader.listPictures();
    }
}

wangguanquan avatar Jan 08 '25 13:01 wangguanquan

我先参照你的方法试试

RexH0 avatar Jan 09 '25 02:01 RexH0

我先参照你的方法试试

上面的方法是否有效?

wangguanquan avatar Jan 10 '25 01:01 wangguanquan

没试呢,在忙其他需求,试了反馈

RexH0 avatar Jan 10 '25 08:01 RexH0

v0.5.22已过滤隐藏和重复图片

wangguanquan avatar Feb 23 '25 15:02 wangguanquan