CentOS 7 安装 Tesseract-OCR4.1

3,706 阅读1分钟

1. 安装依赖的leptonica库

建议使用 su root 切换到root用户下安装,避免编译过程中的权限不足问题

wget http://www.leptonica.org/source/leptonica-1.78.0.tar.gz
tar -xzvf leptonica-1.78.0.tar.gz
cd leptonica-1.78.0
./configure
make && make install

2. 安装Tesseract-OCR

同样建议使用 root 用户编译

wget https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.0
tar -xvf 4.1.0
cd tesseract-4.1.0/
./autogen.sh
./configure
make && make install
sudo ldconfig

安装过程比较简单,根据机器配置与网络情况,可能需要30-60分钟

3. 可能的报错

  1. 执行 ./autogen.sh 报错
./autogen.sh:行59: bail_out: 未找到命令
./autogen.sh:行82: aclocal: 未找到命令

解决方案

yum install automake -y
yum install libtool -y
  1. tesseract make 时报错
libtool: Version mismatch error.  This is libtool 2.4.6, but the
libtool: definition of this LT_INIT comes from libtool 2.4.2.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.6
libtool: and run autoconf again.

解决方案

执行 autoreconf -ivf 命令
  1. 安装完成后执行命令报错
$ tesseract 13.jpg result -l chi_sim

Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

解决方案:

1. 下载预训练文件
2. 将训练文件放至 /usr/local/share/tessdata 目录

下载地址:https://github.com/tesseract-ocr/tessdata
chi_sim.traineddata  中文
eng.traineddata      英文
enm.traineddata      数字