Apache POI 从 word 中提取被裁剪的图片

807 阅读2分钟

问题状况

在从word文件中提取图片时遇到一个问题:部分图片是被裁剪过的,如果直接提取图片则会拿到原图,怎么才能拿到裁剪后的图片?

解决方案

查询相关资料后发现在xml文件里有裁剪的相关信息,拿到相关的裁剪信息对原图进行按比例的裁剪即可,具体步骤如下:

读取word文件

使用Apache POI的 XWPFDocument 读取文件,拿到图片的 paragraph,示例如下:

<xml-fragment w14:paraId="70863DC9" w14:textId="14D2A409" w:rsidR="00A84881" w:rsidRDefault="00A84881" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:drawing>
      <wp:inline distT="0" distB="0" distL="0" distR="0" wp14:anchorId="35CB9A82" wp14:editId="733F3916">
        <wp:extent cx="4521200" cy="2438400"/>
        <wp:effectExtent l="0" t="0" r="0" b="0"/>
        <wp:docPr id="1" name="图片 1" descr="狗和人躺在地上

描述已自动生成"/>
        <wp:cNvGraphicFramePr>
          <a:graphicFrameLocks noChangeAspect="1" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"/>
        </wp:cNvGraphicFramePr>
        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
              <pic:nvPicPr>
                <pic:cNvPr id="1" name="图片 1" descr="狗和人躺在地上

描述已自动生成"/>
                <pic:cNvPicPr/>
              </pic:nvPicPr>
              <pic:blipFill rotWithShape="1">
                <a:blip r:embed="rId4">
                  <a:extLst>
                    <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
                      <a14:useLocalDpi val="0" xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
                    </a:ext>
                  </a:extLst>
                </a:blip>
                <a:srcRect l="16133" t="1806" r="-1853" b="63522"/>
                <a:stretch/>
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0"/>
                  <a:ext cx="4521200" cy="2438400"/>
                </a:xfrm>
                <a:prstGeom prst="rect">
                  <a:avLst/>
                </a:prstGeom>
                <a:ln>
                  <a:noFill/>
                </a:ln>
                <a:extLst>
                  <a:ext uri="{53640926-AAD7-44D8-BBD7-CCE9431645EC}">
                    <a14:shadowObscured xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"/>
                  </a:ext>
                </a:extLst>
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</xml-fragment>

提取裁剪信息

裁剪信息存储在srcRect这个标签里,分为四个属性:

  1. t:top,图片上方裁剪比例
  2. b:bottom,图片下方裁剪比例
  3. l:left,图片左方裁剪比例
  4. r:right,图片右方裁剪比例 这四个属性分别代表各个方向的裁剪比例,属性值除以1000后代表裁剪的百分比,注意:该百分比代表相对当前这个边裁剪的比例,且正值表示向图片内部裁剪,负值表示向图片外部裁剪即扩大了图片范围。

举个例子:

t:0,b:25000

表示上方不裁剪,图片下方裁剪25%(留下上面75%)

裁剪图片

提取完裁剪信息后就简单了,只需要根据裁剪信息裁剪图片即可,代码示例如下:

    String filePath = "cropped_image.docx";
    XWPFDocument xwpfDocument = new XWPFDocument(new FileInputStream(filePath));
    // 提取裁剪信息serRect
    CTRelativeRect ctRelativeRect = xwpfDocument.getParagraphs().get(0).getRuns().get(0).getEmbeddedPictures().get(0).getCTPicture().getBlipFill().getSrcRect();
    // 获取图片数据流,此处part id根据实际更改为你图片的id
    BufferedImage image = ImageIO.read(xwpfDocument.getPartById("rId4").getInputStream());
    int width = image.getWidth();
    int height = image.getHeight();
    // 裁剪起始坐标(左上角坐标)
    int x = Double.valueOf(width * ctRelativeRect.getL() / 100000.0).intValue();
    int y = Double.valueOf(height * ctRelativeRect.getT() / 100000.0).intValue();
    // 裁剪后的长,宽
    int w = Double.valueOf(width * (1 - ctRelativeRect.getR() / 100000.0)).intValue() - x;
    int h = Double.valueOf(height * (1 - ctRelativeRect.getB() / 100000.0)).intValue() - y;
    // 裁剪图片
    BufferedImage croppedImage = image.getSubimage(x, y, w, h);