Please make sure the TESSDATA_PREFIX environment variable is set

613 阅读1分钟

发生情景

导入依赖

<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.1.1</version>
</dependency>

中文字体库地址:点这里

测试

import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import java.io.File;

/**
 * @author 飞宇千虹
 * @date 2023-06-28 20:38
 */
// 识别图片中的文字
public class tess4j {

    public static void main(String[] args) throws TesseractException {
        // 创建实例
        Tesseract tesseract = new Tesseract();
        // 设置字体路径
        tesseract.setDatapath("E:\桌面\learning-files\code\toutiao\day4\tessdata");
        // 设置字体路径
        tesseract.setLanguage("chi_sim");
        // 识别图片
        File file = new File("E:\桌面\learning-files\code\toutiao\day4\text.png");
        String s = tesseract.doOCR(file).replace("\r|\n","-");
        System.out.println(s);
    }
}

报错信息

Error opening data file E:\桌面\learning-files\code\toutiao\day4\tessdata/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
	at com.sun.jna.Native.invokePointer(Native Method)
	at com.sun.jna.Function.invokePointer(Function.java:470)
	at com.sun.jna.Function.invoke(Function.java:404)
	at com.sun.jna.Function.invoke(Function.java:315)
	at com.sun.jna.Library$Handler.invoke(Library.java:212)
	at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
	at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:495)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:358)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:194)
	at tess4j.main(tess4j.java:22)

Process finished with exit code 1

解决策略

因为存放chi_sim.traineddata 包含中文路径,换个路径存放文件就好了