你是不是还在为了资料是 pdf 扫描件,复制不了,批注不了而苦恼,那么你就值得拥有 ORCmyPDF 这个工具
Install
根据自己的系统类型选一个吧,docker 也是支持的额
| Operating system | Install command |
|---|---|
| Debian, Ubuntu | apt install ocrmypdf |
| Windows Subsystem for Linux | apt install ocrmypdf |
| Fedora | dnf install ocrmypdf |
| macOS (Homebrew) | brew install ocrmypdf |
| macOS (MacPorts) | port install ocrmypdf |
| macOS (nix) | nix-env -i ocrmypdf |
| LinuxBrew | brew install ocrmypdf |
| FreeBSD | pkg install py-ocrmypdf |
| Conda | conda install ocrmypdf |
| Ubuntu Snap | snap install ocrmypdf |
Languages
OCRmyPDF 使用 Tesseract 进行光学字符识别(OCR),并且依赖于 Tesseract 的语言包。对于 Linux 用户来说,通常可以通过以下方式找到提供语言包的软件包:
# Display a list of all Tesseract language packs
apt-cache search tesseract-ocr
# Debian/Ubuntu users
apt-get install tesseract-ocr-chi-sim # Example: Install Chinese Simplified language pack
# Arch Linux users
pacman -S tesseract-data-eng tesseract-data-deu # Example: Install the English and German language packs
# brew macOS users
brew install tesseract-lang
此处过后需要等很久很久
检测安装成功
ocrmypdf -h
查看支持的语言
tesseract --list-langs
转换扫描件
ocrmypdf -l chi_sim+eng --force-ocr ./大纲.pdf ./大纲ocr.pdf
总结
总体上来说过程还是比较顺利的,只要细心加耐心,问题就会引刃而解的,在这里也感谢作者大大的贡献。