问题状况
在从word文件中提取图片时遇到一个问题:部分图片是被裁剪过的,如果直接提取图片则会拿到原图,怎么才能拿到裁剪后的图片?
解决方案
查询相关资料后发现在xml文件里有裁剪的相关信息,拿到相关的裁剪信息对原图进行按比例的裁剪即可,具体步骤如下:
读取word文件
使用Apache POI的 XWPFDocument 读取文件,拿到图片的 paragraph,示例如下:
<xml-fragment w14:paraId="70863DC9" w14:textId="14D2A409" w:rsidR="00A84881" w:rsidRDefault="00A84881" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:drawing>
<wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="35CB9A82" wp14:editId="733F3916">
<wp:extent cx="4521200" cy="2438400"/>
<wp:effectExtent l="0" t="0" r="0" b="0"/>
<wp:docPr id="1" name="图片 1" descr="狗和人躺在地上
描述已自动生成"/>
<wp:cNvGraphicFramePr>
<a:graphicFrameLocks noChangeAspect="1" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"/>
</wp:cNvGraphicFramePr>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="图片 1" descr="狗和人躺在地上
描述已自动生成"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill rotWithShape="1">
<a:blip r:embed="rId4">
<a:extLst>
<a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
<a14:useLocalDpi val="0" xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
</a:ext>
</a:extLst>
</a:blip>
<a:srcRect l="16133" t="1806" r="-1853" b="63522"/>
<a:stretch/>
</pic:blipFill>
<pic:spPr bwMode="auto">
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="4521200" cy="2438400"/>
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst/>
</a:prstGeom>
<a:ln>
<a:noFill/>
</a:ln>
<a:extLst>
<a:ext uri="{53640926-AAD7-44D8-BBD7-CCE9431645EC}">
<a14:shadowObscured xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
</a:ext>
</a:extLst>
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
</w:r>
</xml-fragment>
提取裁剪信息
裁剪信息存储在srcRect这个标签里,分为四个属性:
- t:top,图片上方裁剪比例
- b:bottom,图片下方裁剪比例
- l:left,图片左方裁剪比例
- r:right,图片右方裁剪比例 这四个属性分别代表各个方向的裁剪比例,属性值除以1000后代表裁剪的百分比,注意:该百分比代表相对当前这个边裁剪的比例,且正值表示向图片内部裁剪,负值表示向图片外部裁剪即扩大了图片范围。
举个例子:
t:0,b:25000
表示上方不裁剪,图片下方裁剪25%(留下上面75%)
裁剪图片
提取完裁剪信息后就简单了,只需要根据裁剪信息裁剪图片即可,代码示例如下:
String filePath = "cropped_image.docx";
XWPFDocument xwpfDocument = new XWPFDocument(new FileInputStream(filePath));
// 提取裁剪信息serRect
CTRelativeRect ctRelativeRect = xwpfDocument.getParagraphs().get(0).getRuns().get(0).getEmbeddedPictures().get(0).getCTPicture().getBlipFill().getSrcRect();
// 获取图片数据流,此处part id根据实际更改为你图片的id
BufferedImage image = ImageIO.read(xwpfDocument.getPartById("rId4").getInputStream());
int width = image.getWidth();
int height = image.getHeight();
// 裁剪起始坐标(左上角坐标)
int x = Double.valueOf(width * ctRelativeRect.getL() / 100000.0).intValue();
int y = Double.valueOf(height * ctRelativeRect.getT() / 100000.0).intValue();
// 裁剪后的长,宽
int w = Double.valueOf(width * (1 - ctRelativeRect.getR() / 100000.0)).intValue() - x;
int h = Double.valueOf(height * (1 - ctRelativeRect.getB() / 100000.0)).intValue() - y;
// 裁剪图片
BufferedImage croppedImage = image.getSubimage(x, y, w, h);