图片文字识别自动化：截图转文字的3种方法📌 痛点场景同事发来截图，但无法复制文字收到扫描版合同，需要提取关键信息

工作中你是否遇到过：截图无法复制、扫描件无法编辑、发票信息要手动录入？本文教你3种Python自动识别图片文字的方法，从此告别手动录入！

📌 痛点场景

同事发来截图，但无法复制文字
收到扫描版合同，需要提取关键信息
纸质发票要手动录入系统
网课截图太多，笔记整理费时

如果你也经历过这些，说明这篇文章正是你需要的！

🏆 方法对比

| 方法 | 准确率 | 速度 | 费用 | 适用场景 |

|------|--------|------|------|----------|

| Tesseract | ⭐⭐⭐ | 快 | 免费 | 英文、简单中文 |

🔧 方法一：Tesseract（免费开源）

安装


# macOS

brew install tesseract

  


# Ubuntu

sudo apt install tesseract-ocr

  


# Python库

pip install pytesseract pillow

代码实现


import pytesseract

from PIL import Image

  


def ocr_tesseract(image_path):

"""使用Tesseract识别图片文字"""

# 可选：指定中文语言包

# pytesseract.pytesseract.tesseract_cmd = r'/usr/local/bin/tesseract'

# 识别文字

text = pytesseract.image_to_string(Image.open(image_path), lang='chi_sim+eng')

return text

  


# 使用示例

result = ocr_tesseract('screenshot.png')

print(result)

💡 进阶：图像预处理提升准确率


from PIL import Image, ImageEnhance, ImageFilter

import numpy as np

  


def preprocess_image(image_path):

"""图像预处理：提升识别准确率"""

img = Image.open(image_path)

# 转灰度

img = img.convert('L')

# 增加对比度

enhancer = ImageEnhance.Contrast(img)

img = enhancer.enhance(2)

# 降噪

img = img.filter(ImageFilter.MedianFilter(size=3))

return img

  


# 预处理后再识别

img = preprocess_image('screenshot.png')

text = pytesseract.image_to_string(img, lang='chi_sim+eng')

print(text)

📊 效果展示


原文图片：工作截图（包含中文和英文）

识别结果：Python自动化办公实战 - 提高效率300%

优点：免费开源、无需APIKey、离线可用

缺点：对复杂中文场景支持一般

🌐 方法二：OCR.space（免费API）

申请API Key

访问 ocr.space/ocrapi
注册账号
获取免费API Key（每天5000次免费）

代码实现


import base64

import requests

import json

  


def ocr_space(image_path, api_key='helloworld'):

"""使用OCR.space API识别图片文字"""

with open(image_path, 'rb') as f:

img_base64 = base64.b64encode(f.read()).decode('utf-8')

payload = {

'base64Image': f'data:image/png;base64,{img_base64}',

'language': 'chs', # 简体中文

'isOverlayRequired': False,

'detectOrientation': True,

'scale': True,

'OCREngine': 2 # V2引擎效果更好

}

headers = {

'apikey': api_key

}

response = requests.post(

'https://api.ocr.space/parse/image',

data=payload,

headers=headers

)

result = response.json()

if result['ParsedResults']:

return result['ParsedResults'][0]['ParsedText']

return None

  


# 使用示例

text = ocr_space('invoice.png', api_key='your_api_key')

print(text)

📊 效果展示


原文图片：增值税发票（复杂表格）

识别结果：

发票代码：144031900110

发票号码：01234567

金额：壹仟元整 ¥1000.00

优点：支持中文、免费额度够用、无需复杂配置

缺点：依赖网络、有免费限制

🇨🇳 方法三：百度AI文字识别（高精度）

申请API

访问 login.bce.baidu.com/
创建应用获取 API Key 和 Secret Key
免费额度：50000次/天

代码实现


import requests

import base64

import json

  


class BaiduOCR:

def __init__(self, api_key, secret_key):

self.api_key = api_key

self.secret_key = secret_key

self.token = self._get_token()

def _get_token(self):

"""获取Access Token"""

url = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={self.api_key}&client_secret={self.secret_key}'

response = requests.get(url)

return response.json()['access_token']

def recognize(self, image_path, language_type='CHN_ENG'):

"""通用文字识别"""

with open(image_path, 'rb') as f:

img = base64.b64encode(f.read()).decode('utf-8')

url = f'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic?access_token={self.token}'

headers = {'Content-Type': 'application/x-www-form-urlencoded'}

data = {'image': img, 'language_type': language_type}

response = requests.post(url, data=data, headers=headers)

result = response.json()

# 提取文字

if 'words_result' in result:

return '\n'.join([w['words'] for w in result['words_result']])

return None

  


# 使用示例

ocr = BaiduOCR(api_key='your_api_key', secret_key='your_secret_key')

text = ocr.recognize('screenshot.png')

print(text)

🚀 高级功能：表格识别


def recognize_table(self, image_path):

"""表格识别 - 返回结构化数据"""

with open(image_path, 'rb') as f:

img = base64.b64encode(f.read()).decode('utf-8')

url = f'https://aip.baidubce.com/rest/2.0/ocr/v1/tableRecognize?access_token={self.token}'

headers = {'Content-Type': 'application/x-www-form-urlencoded'}

data = {'image': img, 'table_locale': 'CHN_ENG'}

response = requests.post(url, data=data, headers=headers)

result = response.json()

# 返回Excel格式

if 'result' in result:

return result['result']['data']

return None

  


# 使用示例

table_data = ocr.recognize_table('invoice.png')

print(table_data)

📊 效果展示


原文图片：合同扫描件（带表格）

识别结果：

┌─────────────┬────────────┬────────────┐

│ 甲乙双方 │ 甲方：XXX │ 乙方：YYY │

├─────────────┼────────────┼────────────┤

│ 合同金额 │ ¥100,000 │ 壹拾万元整 │

└─────────────┴────────────┴────────────┘

优点：准确率最高、支持多种场景、表格识别强大

缺点：需要申请API、有免费额度限制

📈 实际应用案例

案例1：批量截图转笔记


import os

from pathlib import Path

  


def batch_ocr(input_folder, output_file):

"""批量识别文件夹中的所有图片"""

ocr = BaiduOCR(api_key='your_key', secret_key='your_secret')

results = []

for img_path in Path(input_folder).glob('*.png'):

print(f'正在识别: {img_path.name}')

text = ocr.recognize(str(img_path))

results.append(f'\n\n=== {img_path.name} ===\n{text}')

# 保存为笔记文件

with open(output_file, 'w', encoding='utf-8') as f:

f.write('\n'.join(results))

print(f'✅ 完成！已保存到 {output_file}')

  


# 使用

batch_ocr('screenshots/', 'notes.txt')

效果：100张截图 → 3分钟完成识别 📚

案例2：发票自动录入系统


import json

from datetime import datetime

  


def invoice_ocr_system(invoice_path):

"""发票识别并提取关键信息"""

ocr = BaiduOCR(api_key='your_key', secret_key='your_secret')

# 识别发票

result = ocr.recognize(invoice_path)

# 提取关键字段（简单示例）

invoice_data = {

'识别时间': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),

'发票代码': '',

'发票号码': '',

'金额': '',

'税额': ''

}

# 这里可以加入正则表达式提取

# ...（根据实际需求定制）

return invoice_data

  


# 使用

data = invoice_ocr_system('invoice.png')

print(json.dumps(data, ensure_ascii=False, indent=2))

🎯 选型建议

| 场景 | 推荐方案 |

|------|----------|

| 简单英文截图 | Tesseract（免费） |

| 偶尔使用中文 | OCR.space（免费额度） |

| 生产环境、高精度 | 百度AI（付费但值得） |

| 表格/票据识别 | 百度AI（专用接口） |

💰 效率提升数据

| 任务 | 手动耗时 | 自动化耗时 | 提升 |

|------|----------|------------|------|

| 10张截图转文字 | 30分钟 | 1分钟 | 30倍 |

| 100张发票录入 | 5小时 | 10分钟 | 30倍 |

| 1篇网课笔记整理 | 2小时 | 5分钟 | 24倍 |

算一算：每周节省5小时 = 每年多出250小时 = 多了1个月工作日！

📦 完整项目代码

我已经把上述代码整理成完整项目，扫码可直接使用：


# 项目结构

├── main.py # 主程序

├── tesseract_ocr.py # Tesseract识别

├── ocr_space.py # OCR.space API

├── baidu_ocr.py # 百度AI识别

├── batch.py # 批量处理

└── requirements.txt # 依赖

⚡ 下期预告

《文件批量重命名自动化：3秒整理1000个文件》

教你用Python实现：

按日期/类型/内容批量重命名
自动分类整理
配合正则表达式精确匹配

📚 更多阅读

💬 评论区互动

你最常用的OCR场景是什么？截图识别、发票录入、还是其他？评论区聊聊！