visual studio C++使用PaddleOCR进行图片文字识别

2,354 阅读6分钟

PaddleOCR

PaddleOCR 是一个基于 PaddlePaddle 深度学习框架的开源光学字符识别(OCR)工具。它旨在提供高效、准确的文本识别解决方案,支持多种语言和多种场景的应用

CMake

下载CMake并安装: 下载地址cmake.org/download/

环境准备

PaddleOCR C++ 本文使用的依赖库以及版本:

  • opencv(OpenCV – 4.10.0)
  • Paddle Interface (2.6.2)
  • PaddleOCR 推理模型 (V4)
  • PaddleOCR (2.8.1)
  • CMake (3.31.0-rc2)

opencv 安装

从官网下载安装包.opencv.org/releases/ 解压到指定目录,后面需要用到.

image.png

Paddle Interface 框架安装

打开官网 : www.paddlepaddle.org.cn/

点击导航栏的 文档 ,选择弹出来的Paddle Interface ,然后点击左边导航栏,下载对应系统的推理库.我这里下载windows的推理库.

image.png

下载后,解压到指定目录,后续需要用到.

image.png

推理模型下载

打开github模型地址github.com/PaddlePaddl… 下载适合的模型.这里我下载

检测模型

image.png

识别模型

image.png

解压到指定目录,后面会用到

image.png

Cmake 进行快速环境配置

官方教程: github.com/PaddlePaddl…

  1. 修改auto-log的地址.

打开文件deploy/cpp_infer/external-cmake/auto-log.cmake 修改git仓库地址:

GIT_REPOSITORY https://gitee.com/Double_V/AutoLog

2. 下载dirent.h文件: paddleocr.bj.bcebos.com/deploy/cpp_…

并拷贝到 Visual Studio 的 include 文件夹下,如D:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include(替换成你的IDE相应的目录下)。

D:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include

3. 打开CMake,选择源文件目录.

D:\code\PaddleOCR-2.8.1\deploy\cpp_infer

然后配置CMake,这里CPU模式的话,只需要配置OpenCV的目录与paddle_lib的目录.

image.png

然后依次点击configuare generate open ,没点一个需要等待他运行完成.

手动环境配置

  1. 打开visual studio. 新建项目,复制D:\code\PaddleOCR-2.8.1\deploy\cpp_infer\src源文件到新项目中,并添加为源文件.

  2. 复制包含目录: 右键 -> 属性 -> C/C++ -> 常规 -> 附加包含目录配置进行复制.

  3. 修改运行库为MT 因为PaddleOCR不支持debug模式. image.png

  4. 复制附加库目录: 右键 -> 属性 -> 连接器 -> 常规 -> 附加库目录 进行复制.

  5. 复制依赖的lib: 右键 -> 属性 -> 连接器 -> 输入 -> 附加依赖项 进行复制

  6. 复制生成后事件脚本: 项目 -> 右键 -> 属性 -> 生成事件 -> 生成后事件 进行复制

编写测试代码

官方的cpp文件,是基于目录配置推理库与输入来做的图片识别,我们程序真正使用的时候,需要自己来选择图片进行识别.参考官方main.cpp的代码.首先找到main函数

int main(int argc, char** argv) {
    // Parsing command-line
    google::ParseCommandLineFlags(&argc, &argv, true);
    check_params();

    if (!Utility::PathExists(FLAGS_image_dir)) {
        std::cerr << "[ERROR] image path not exist! image_dir: " << FLAGS_image_dir
            << std::endl;
        exit(1);
    }

    std::vector<cv::String> cv_all_img_names;
    cv::glob(FLAGS_image_dir, cv_all_img_names);
    std::cout << "total images num: " << cv_all_img_names.size() << std::endl;

    if (!Utility::PathExists(FLAGS_output)) {
        Utility::CreateDir(FLAGS_output);
    }
    if (FLAGS_type == "ocr") {
        ocr(cv_all_img_names);
    }
    else if (FLAGS_type == "structure") {
        structure(cv_all_img_names);
    }
    else {
        std::cout << "only value in ['ocr','structure'] is supported" << std::endl;
    }
}

从这里可以看出,这个示例代码是通过解析命令行参数来进行配置的.这里我们可以找到参数配置的文件:args.cpp文件,打开该文件:

// common args
DEFINE_bool(use_gpu, false, "Infering with GPU or CPU.");
DEFINE_bool(use_tensorrt, false, "Whether use tensorrt.");
DEFINE_int32(gpu_id, 0, "Device id of GPU to execute.");
DEFINE_int32(gpu_mem, 4000, "GPU id when infering with GPU.");
DEFINE_int32(cpu_threads, 10, "Num of threads with CPU.");
DEFINE_bool(enable_mkldnn, false, "Whether use mkldnn with CPU.");
DEFINE_string(precision, "fp32", "Precision be one of fp32/fp16/int8");
DEFINE_bool(benchmark, false, "Whether use benchmark.");
DEFINE_string(output, "./output/", "Save benchmark log path.");
DEFINE_string(image_dir, "", "Dir of input image.");
DEFINE_string(
	type, "ocr",
	"Perform ocr or structure, the value is selected in ['ocr','structure'].");
// detection related
DEFINE_string(det_model_dir, "", "Path of det inference model.");
DEFINE_string(limit_type, "max", "limit_type of input image.");
DEFINE_int32(limit_side_len, 960, "limit_side_len of input image.");
DEFINE_double(det_db_thresh, 0.3, "Threshold of det_db_thresh.");
DEFINE_double(det_db_box_thresh, 0.6, "Threshold of det_db_box_thresh.");
DEFINE_double(det_db_unclip_ratio, 1.5, "Threshold of det_db_unclip_ratio.");
DEFINE_bool(use_dilation, false, "Whether use the dilation on output map.");
DEFINE_string(det_db_score_mode, "slow", "Whether use polygon score.");
DEFINE_bool(visualize, true, "Whether show the detection results.");
// classification related
DEFINE_bool(use_angle_cls, false, "Whether use use_angle_cls.");
DEFINE_string(cls_model_dir, "", "Path of cls inference model.");
DEFINE_double(cls_thresh, 0.9, "Threshold of cls_thresh.");
DEFINE_int32(cls_batch_num, 1, "cls_batch_num.");
// recognition related
DEFINE_string(rec_model_dir, "", "Path of rec inference model.");
DEFINE_int32(rec_batch_num, 6, "rec_batch_num.");
DEFINE_string(rec_char_dict_path, "../../ppocr/utils/ppocr_keys_v1.txt",
	"Path of dictionary.");
DEFINE_int32(rec_img_h, 48, "rec image height");
DEFINE_int32(rec_img_w, 320, "rec image width");

// layout model related
DEFINE_string(layout_model_dir, "", "Path of table layout inference model.");
DEFINE_string(layout_dict_path,
	"../../ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt",
	"Path of dictionary.");
DEFINE_double(layout_score_threshold, 0.5, "Threshold of score.");
DEFINE_double(layout_nms_threshold, 0.5, "Threshold of nms.");
// structure model related
DEFINE_string(table_model_dir, "", "Path of table struture inference model.");
DEFINE_int32(table_max_len, 488, "max len size of input image.");
DEFINE_int32(table_batch_num, 1, "table_batch_num.");
DEFINE_bool(merge_no_span_structure, true,
	"Whether merge <td> and </td> to <td></td>");
DEFINE_string(table_char_dict_path,
	"../../ppocr/utils/dict/table_structure_dict_ch.txt",
	"Path of dictionary.");

// ocr forward related
DEFINE_bool(det, true, "Whether use det in forward.");
DEFINE_bool(rec, true, "Whether use rec in forward.");
DEFINE_bool(cls, false, "Whether use cls in forward.");
DEFINE_bool(table, false, "Whether use table structure in forward.");
DEFINE_bool(layout, false, "Whether use layout analysis in forward.");

这里可以看到,参数配置这里可以设置默认值.我们需要从源文件中找到以下文件进行复制到程序根目录. 在源码根目录下新建一个目录config.

  1. 字典配置.从源码文件中拷贝字典的txt文件到config目录下.并更改配置

    DEFINE_string(rec_char_dict_path, "./config/ppocr_keys_v1.txt", "识别字典的路径。");
    DEFINE_string(layout_dict_path, "./config/layout_publaynet_dict.txt", "布局字典的路径。");
    DEFINE_string(table_char_dict_path, "./config/table_structure_dict_ch.txt", "结构字典的路径。");

这里只需要配置识别的字典即可,另外两个官网没有推理模型下载.需要自己训练.

  1. 下载的模型文件拷贝到config下,并配置.
    DEFINE_string(det_model_dir, "./config/ch_PP-OCRv4_det_server_infer/", "检测推理模型的路径.");
    DEFINE_string(cls_model_dir, "./config/ch_ppocr_mobile_v2.0_cls_slim_infer/", "分类推理模型的路径。");
    DEFINE_string(rec_model_dir, "./config/ch_PP-OCRv4_rec_server_infer/", "识别推理模型的路径。");
  1. 配置运行的参数
    DEFINE_bool(det, true, "前向过程中是否使用检测。");
    DEFINE_bool(rec, true, "前向过程中是否使用识别。");
    DEFINE_bool(cls, false, "前向过程中是否使用分类。");
    DEFINE_bool(table, false, "前向过程中是否使用表格结构。");
    DEFINE_bool(layout, false, "前向过程中是否使用布局分析。");

修改main的代码,去掉从命令行获取参数,以及校验.写死图片路径.然后打印检测结果.

    int main(int argc, char** argv) {
    	// Parsing command-line
    	//google::ParseCommandLineFlags(&argc, &argv, true);
    	//check_params();

    	//if (!Utility::PathExists(FLAGS_image_dir)) {
    	//    std::cerr << "[ERROR] image path not exist! image_dir: " << FLAGS_image_dir
    	//        << std::endl;
    	//    exit(1);
    	//}

    	std::vector<cv::String> cv_all_img_names;
    	//cv::glob(FLAGS_image_dir, cv_all_img_names);
    	//std::cout << "total images num: " << cv_all_img_names.size() << std::endl;

    	cv_all_img_names.push_back("D:\\1123.bmp");

    	if (!Utility::PathExists(FLAGS_output)) {
    		Utility::CreateDir(FLAGS_output);
    	}
    	if (FLAGS_type == "ocr") {
    		ocr(cv_all_img_names);
    	}
    	else if (FLAGS_type == "structure") {
    		structure(cv_all_img_names);
    	}
    	else {
    		std::cout << "only value in ['ocr','structure'] is supported" << std::endl;
    	}
    }

模型用轻量级的就行,server的那个太慢了....