发生情景
导入依赖
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.1.1</version>
</dependency>
中文字体库地址:点这里
测试
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import java.io.File;
/**
* @author 飞宇千虹
* @date 2023-06-28 20:38
*/
// 识别图片中的文字
public class tess4j {
public static void main(String[] args) throws TesseractException {
// 创建实例
Tesseract tesseract = new Tesseract();
// 设置字体路径
tesseract.setDatapath("E:\桌面\learning-files\code\toutiao\day4\tessdata");
// 设置字体路径
tesseract.setLanguage("chi_sim");
// 识别图片
File file = new File("E:\桌面\learning-files\code\toutiao\day4\text.png");
String s = tesseract.doOCR(file).replace("\r|\n","-");
System.out.println(s);
}
}
报错信息
Error opening data file E:\桌面\learning-files\code\toutiao\day4\tessdata/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:470)
at com.sun.jna.Function.invoke(Function.java:404)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:495)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:358)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:194)
at tess4j.main(tess4j.java:22)
Process finished with exit code 1
解决策略
因为存放chi_sim.traineddata 包含中文路径,换个路径存放文件就好了