Springboot整合Tess4J运行OCR识别图片中的文字

510 阅读1分钟

Springboot整合Tess4J

1.引入pom文件

<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.1.0</version>
</dependency>

2.将需要使用的语言文件放入文件夹中,本文识别的是英文

图片.png

3.代码示例

BufferedImage image=ImageIO.read(new FileInputStream("图片地址"));
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("D:\tessdata");
tesseract.setLanguage("eng");
String text = tesseract.doOCR(image)

linux中的使用踩坑

Tess4在windows中可以直接运行,在linux中需要部署环境

由于tess4j是对tesseract的封装,tesseract又依赖于leptonica。所以我们需要先安装好tesseract与leptonica

1.安装依赖,最基本的环境

yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel gcc gcc-c++

2.安装tesseract和leptonica文件放到/usr/local

leptonica-1.79.0.tar.gz

tesseract-4.1.1.tar.gz

3.执行安装命令

cd /usr/local
mkdir  /usr/local/leptonica
tar -xzvf leptonica-1.79.0.tar.gz
cd leptonica-1.79.0
./configure --prefix=/usr/local/leptonica  && make  && make install

4.配置leptonica环境变量和tesseract环境变量

执行vim /etc/profile命令在文本末尾写入

PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/leptonica/lib/pkgconfig
export PKG_CONFIG_PATH
CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export CPLUS_INCLUDE_PATH
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export C_INCLUDE_PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib
export LD_LIBRARY_PATH
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib
export LIBRARY_PATH
LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica
export LIBLEPT_HEADERSDIR

PATH=$PATH:/usr/local/tesseract/bin
export PATH
export TESSDATA_PREFIX=/usr/local/xxx/xxxx  ##注意:该位置是训练库所在文件目录
export PATH=$PATH:$TESSDATA_PREFIX

输入配置生效命令 source /etc/profile

6.执行安装命令

cd /usr/local
mkdir /usr/local/tesseract
tar -xzvf tesseract-4.1.1.tar.gz
cd tesseract-4.1.1
./autogen.sh
./configure --prefix=/usr/local/tesseract  && make && make install

6.测试是否安装成功

执行命令tesseract --version

控制台出现tesseract版本号和leptonica版本号就算成功了

9df1ebc7c07a46de892dd00f07949f6d.png

7.springboot项目配置 复制/usr/local/tesseract/lib目录下的文件到springboot项目

8d3ec63132e94e14b91cc004d65a1955.png

01466612cb5a42a8bb6acc37fe80b653.png

将eng.traineddata放入jar包同级目录中,注释掉设置路径的代码

BufferedImage image=ImageIO.read(new FileInputStream("图片地址"));
Tesseract tesseract = new Tesseract();
//tesseract.setDatapath("D:\tessdata");
tesseract.setLanguage("eng");
String text = tesseract.doOCR(image)