如何利用tesseract提取图片里的文字

357 阅读1分钟

不废话,直接上方式

tesseract.js 是一个可以分析出图片上文字的一个库

github地址:github.com/naptha/tess…

首先需 npm install tesseract

npm i tesseract.js

然后如下步骤

import { createWorker } from 'tesseract.js';

(async () => {
  const worker = await createWorker('eng');
  const ret = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
  console.log(ret.data.text);
  await worker.terminate();
})();

上面只能解析英文,可以看到有 eng 这个参数,那怎么才能解析中文呢?只需要改成chi_sim

如果是中英混合,改成eng+chi_sim

示例pic如下

image.png

代码执行结果

Mild Splendour of the various-vested Night! Mother of wildly-working visions! hail I watch thy gliding, while with watery light Thy weak eye glimmers through a fleecy veil; And when thou lovest thy pale orb to shroud Behind the gather’d blackness lost on high; And when thou dartest from the wind-rent cloud Thy placid lightning o’er the awaken’d sky.