从Excel中读取的图片是否可以区分是不是隐藏对象?
List<Drawings.Picture> pictures = reader.listPictures(); 读取出来的图片有没有办法区分是不是隐藏对象?
如何隐藏图片,可以给一个操作手册我本地测试一下才能给出方案
文件已发送到您邮箱 导入此文件,使用 List<Drawings.Picture> pictures = reader.listPictures(); 读取文件时 程序直接卡死 不会再往下走, 不知道excel 有什么问题,用wps打开也很慢,客户做的excel,有没有办法可以提示文件有问题?
卡住是因为读到十几万个图片,这些图片在excel 内看不到
目前读图片会一次性解析到list保存到内存,几十万图片会造成内存不足触发频繁GC导致CPU飙高服务器GG
这个表格应该是有问题的,这些图片在表格里我也没看到在哪里,程序有没有办法把它找出来 ?
晚上看一下原文件再回复你
这个文件我本地出不来,使用EEC发现drawing.xml有大量重复的节点,这个文件有近90M而EEC读取图片使用dom4j普通模式读取缓慢所以造成假死,图片显示不出来是因为被隐藏了
那是否有办法可以判断xml文件中有重复节点?
使用如下方法可以极大提升读取速度
- 自定义XMLDrawings并覆写parseDrawings方法,处理“隐藏”图片和重复节点
- 自定义ExcelReader并覆写init方法引用第一步自定义XMLDrawings
- 使用自定义ExcelReader
// 自定义XMLDrawings
public static class MyXMLDrawings extends XMLDrawings {
public MyXMLDrawings(ExcelReader reader) {
super(reader);
}
// Parse drawings.xml
protected List<Picture> parseDrawings(ZipFile zipFile, ZipEntry entry, Path imagesPath) {
int i = entry.getName().lastIndexOf('/');
String relsKey;
if (i > 0)
relsKey = entry.getName().substring(0, i) + "/_rels" + entry.getName().substring(i);
else if ((i = entry.getName().lastIndexOf('\\')) > 0)
relsKey = entry.getName().substring(0, i) + "\\_rels" + entry.getName().substring(i);
else relsKey = entry.getName();
String key = relsKey + ".rels";
ZipEntry entry1 = getEntry(zipFile, key);
if (entry1 == null) return null; //throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
SAXReader reader = SAXReader.createDefault();
Document document;
try {
document = reader.read(zipFile.getInputStream(entry1));
} catch (DocumentException | IOException e) {
throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
}
List<Element> list = document.getRootElement().elements();
Relationship[] rels = new Relationship[list.size()];
i = 0;
for (Element e : list) {
rels[i++] = new Relationship(e.attributeValue("Id"), e.attributeValue("Target"), e.attributeValue("Type"));
}
RelManager relManager = RelManager.of(rels);
try {
document = reader.read(zipFile.getInputStream(entry));
} catch (DocumentException | IOException e) {
throw new ExcelReadException("The file format is incorrect or corrupted. [" + entry.getName() + "]");
}
Element root = document.getRootElement();
Namespace xdr = root.getNamespaceForPrefix("xdr"), a = root.getNamespaceForPrefix("a");
List<Element> elements = root.elements();
List<Picture> pictures = new ArrayList<>(elements.size());
// 处理大量重复的节点
Map<String, Path> localPathMap = new HashMap<>(Math.min(1 << 10, elements.size()));
for (Element e : root.elements()) {
Element pic = e.element(QName.get("pic", xdr));
// Not a picture
if (pic == null) continue;
Element blipFill = pic.element(QName.get("blipFill", xdr));
if (blipFill == null) continue;
Element blip = blipFill.element(QName.get("blip", a));
if (blip == null) continue;
// FIXME 判断是否隐藏,如果为"隐藏"图片需要业务判断是否需要读取
Element nvPicPr = pic.element(QName.get("nvPicPr", xdr));
if (nvPicPr != null) {
Element cNvPr = nvPicPr.element(QName.get("cNvPr", xdr));
// 隐藏图片默认不读取
if (cNvPr != null && "1".equals(cNvPr.attributeValue("hidden"))) {
continue;
}
}
Namespace r = blip.getNamespaceForPrefix("r");
String embed = blip.attributeValue(QName.get("embed", r));
Relationship rel = relManager.getById(embed);
if (rel != null && Const.Relationship.IMAGE.equals(rel.getType())) {
Picture picture = new Picture();
pictures.add(picture);
// Copy image to tmp path
String target = toZipPath(rel.getTarget());
// FIXME 修改点:先从缓存里查看是否已解析过图片,如果有则直接从缓存中获取
Path targetPath = localPathMap.get(target);
if (targetPath == null && (entry = getEntry(zipFile, "xl/" + target)) != null) {
// Copy image to tmp path
try {
targetPath = imagesPath.resolve(rel.getTarget());
Files.copy(zipFile.getInputStream(entry), targetPath, StandardCopyOption.REPLACE_EXISTING);
localPathMap.put(target, targetPath);
} catch (IOException ioException) { }
}
picture.localPath = targetPath;
int[][] ft = parseDimension(e, xdr);
picture.dimension = new Dimension(ft[0][2] + 1, (short) (ft[0][0] + 1), ft[1][2] + 1, (short) (ft[1][0] + 1));
picture.padding = new short[] { (short) ft[0][3], (short) ft[1][1], (short) ft[1][3], (short) ft[0][1] };
String editAs = e.attributeValue("editAs");
int property = -1;
if (StringUtil.isNotEmpty(editAs)) {
switch (editAs) {
case "twoCell" : property = 0; break;
case "oneCell" : property = 1; break;
case "absolute": property = 2; break;
default:
}
}
picture.property = property;
Element spPr = pic.element(QName.get("spPr", xdr));
if (spPr != null) {
Element xfrm = spPr.element(QName.get("xfrm", a));
String rot;
if (xfrm != null && StringUtil.isNotBlank(rot = xfrm.attributeValue("rot"))) {
try {
picture.revolve = Integer.parseInt(rot) / 60000;
} catch (Exception ex) {
// Ignore
}
}
// TODO Attach picture effects
}
Element extLst = blip.element(QName.get("extLst", a));
if (extLst == null) continue;
for (Element ext : extLst.elements()) {
Element srcUrl = ext.element("picAttrSrcUrl");
// hyperlink
if (srcUrl != null) {
rel = relManager.getById(srcUrl.attributeValue(QName.get("id", r)));
if (rel != null && Const.Relationship.HYPERLINK.equals(rel.getType())) {
picture.srcUrl = rel.getTarget();
}
}
}
}
}
return !pictures.isEmpty() ? pictures : null;
}
}
// 自定义ExcelReader
public static class MyExcelReader extends ExcelReader {
public MyExcelReader(Path path) throws IOException {
super(path);
}
public MyExcelReader(InputStream stream) throws IOException {
super(stream);
}
@Override
protected ExcelReader init(Path path) throws IOException {
super.init(path);
// FIXME 使用MyXMLDrawings
if (drawings != null) {
drawings = new MyXMLDrawings(this);
for (Sheet sheet : sheets) {
((XMLSheet) sheet).setDrawings(drawings);
}
}
return this;
}
}
// 使用自定义ExcelReader
@Test public void testMyExcelReader() throws IOException {
// FIXME 使用自定义MyExcelReader
try (ExcelReader reader = new MyExcelReader(Paths.get("F:/excel/1836921628.xlsx"))) {
List<Drawings.Picture> list = reader.listPictures();
}
}
我先参照你的方法试试
我先参照你的方法试试
上面的方法是否有效?
没试呢,在忙其他需求,试了反馈
v0.5.22已过滤隐藏和重复图片