Springboot整合Tess4J
1.引入pom文件
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.1.0</version>
</dependency>
2.将需要使用的语言文件放入文件夹中,本文识别的是英文
3.代码示例
BufferedImage image=ImageIO.read(new FileInputStream("图片地址"));
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("D:\tessdata");
tesseract.setLanguage("eng");
String text = tesseract.doOCR(image)
linux中的使用踩坑
Tess4在windows中可以直接运行,在linux中需要部署环境
由于tess4j是对tesseract的封装,tesseract又依赖于leptonica。所以我们需要先安装好tesseract与leptonica
1.安装依赖,最基本的环境
yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel gcc gcc-c++
2.安装tesseract和leptonica文件放到/usr/local
3.执行安装命令
cd /usr/local
mkdir /usr/local/leptonica
tar -xzvf leptonica-1.79.0.tar.gz
cd leptonica-1.79.0
./configure --prefix=/usr/local/leptonica && make && make install
4.配置leptonica环境变量和tesseract环境变量
执行vim /etc/profile命令在文本末尾写入
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/leptonica/lib/pkgconfig
export PKG_CONFIG_PATH
CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export CPLUS_INCLUDE_PATH
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/leptonica/include/leptonica
export C_INCLUDE_PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib
export LD_LIBRARY_PATH
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib
export LIBRARY_PATH
LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica
export LIBLEPT_HEADERSDIR
PATH=$PATH:/usr/local/tesseract/bin
export PATH
export TESSDATA_PREFIX=/usr/local/xxx/xxxx ##注意:该位置是训练库所在文件目录
export PATH=$PATH:$TESSDATA_PREFIX
输入配置生效命令 source /etc/profile
6.执行安装命令
cd /usr/local
mkdir /usr/local/tesseract
tar -xzvf tesseract-4.1.1.tar.gz
cd tesseract-4.1.1
./autogen.sh
./configure --prefix=/usr/local/tesseract && make && make install
6.测试是否安装成功
执行命令tesseract --version
控制台出现tesseract版本号和leptonica版本号就算成功了
7.springboot项目配置 复制/usr/local/tesseract/lib目录下的文件到springboot项目
将eng.traineddata放入jar包同级目录中,注释掉设置路径的代码
BufferedImage image=ImageIO.read(new FileInputStream("图片地址"));
Tesseract tesseract = new Tesseract();
//tesseract.setDatapath("D:\tessdata");
tesseract.setLanguage("eng");
String text = tesseract.doOCR(image)