纯C++手撸PP-OCRv5文字识别！不依赖OpenCV，从零到跑通全流程

Ne0inhk

23 Mar 2026 — 14 min read

纯C++手撸PaddleOCR PP-OCRv5文字识别！不依赖OpenCV，从零到跑通全流程

你是不是也遇到过这种情况：想在C++项目里加个OCR功能，结果光装OpenCV就折腾半天？今天教你零OpenCV依赖，用Paddle Inference + stb_image，纯C++实现PP-OCRv5文字识别全流程（检测+识别），代码可直接跑！

一、效果先行

 cd /home/michah/桌面/paddle_inference && ./build/ocr_demo build/640.png --text-only cd /home/michah/桌面/paddle_inference && ./build/ocr_demo build/640.png

一张ZEEKLOG个人主页截图，识别结果：

IP属地：陕西省加入ZEEKLOG时间：2020-01-31 查看详细资料 weixin_46244623 码龄6年 131566总访问196原创248粉丝60关注 公众号·懒人程序

检测耗时 665ms，识别耗时 1453ms，CPU上跑的，不用GPU。

二、准备工作

2.1 下载 Paddle Inference C++ 推理库

去Paddle Inference官网下载C++预测库（Linux CPU版本即可），解压后目录结构长这样：

https://paddle-inference-lib.bj.bcebos.com/3.0.0/cxx_c/Linux/CPU/gcc8.2_avx_mkl/paddle_inference.tgz tar -xf paddle_inference-linux.tgz

paddle_inference/ ├── paddle/ │ ├── include/ # 头文件 │ └── lib/ # .so 动态库 ├── third_party/ # 第三方依赖（MKL、oneDNN等） └── version.txt

2.2 下载 PP-OCRv5 Server 模型

PP-OCRv5有mobile和server两个版本，server精度更高。需要下载检测模型和识别模型两个：

cd paddle_inference/models # 检测模型wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_det_infer.tar tar xf PP-OCRv5_server_det_infer.tar # 识别模型wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0.0/PP-OCRv5_server_rec_infer.tar tar xf PP-OCRv5_server_rec_infer.tar

解压后每个模型目录里有三个文件：

inference.json — PIR格式模型结构（PP-OCRv5新格式）
inference.pdiparams — 模型参数
inference.yml — 预处理配置

2.3 提取字典文件

识别模型的字典嵌在inference.yml里，用Python提取出来：

import yaml withopen('models/PP-OCRv5_server_rec_infer/inference.yml')as f: d = yaml.safe_load(f) chars = d['PostProcess']['character_dict']withopen('models/ppocr_v5_dict.txt','w')as f:for c in chars: f.write(str(c)+'\n')print(f'字典：{len(chars)} 字符')# 18383 字符

2.4 下载 stb 头文件

我们用stb替代OpenCV来加载和缩放图片，只需要两个头文件：

mkdir -p third_party/stb cd third_party/stb # stb_image.h（图片加载）wget https://gitee.com/mirrors/stb/raw/master/stb_image.h # stb_image_resize2.h（图片缩放）wget https://gitee.com/mirrors/stb/raw/master/stb_image_resize2.h

三、CMakeLists.txt

cmake_minimum_required(VERSION 3.10) project(PaddleInferenceDemo LANGUAGES CXX) set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED ON) # Paddle Inference SDK 路径（当前目录即为 SDK 根目录） set(PADDLE_INFER_DIR "${CMAKE_CURRENT_SOURCE_DIR}") set(PADDLE_INCLUDE_DIR "${PADDLE_INFER_DIR}/paddle/include") set(PADDLE_LIB_DIR "${PADDLE_INFER_DIR}/paddle/lib") set(THIRD_PARTY_DIR "${PADDLE_INFER_DIR}/third_party") # 头文件目录 include_directories(${PADDLE_INCLUDE_DIR}) include_directories(${THIRD_PARTY_DIR}/install/mklml/include) include_directories(${THIRD_PARTY_DIR}/install/onednn/include) include_directories(${THIRD_PARTY_DIR}/install/gflags/include) include_directories(${THIRD_PARTY_DIR}/install/glog/include) include_directories(${THIRD_PARTY_DIR}/install/xxhash/include) include_directories(${THIRD_PARTY_DIR}/install/cryptopp/include) include_directories(${THIRD_PARTY_DIR}/install/utf8proc/include) include_directories(${THIRD_PARTY_DIR}/install/yaml-cpp/include) include_directories(${THIRD_PARTY_DIR}/install/protobuf/include) include_directories(${THIRD_PARTY_DIR}/stb) include_directories(${PADDLE_INFER_DIR}) # 链接库目录 link_directories(${PADDLE_LIB_DIR}) link_directories(${THIRD_PARTY_DIR}/install/mklml/lib) link_directories(${THIRD_PARTY_DIR}/install/onednn/lib) link_directories(${THIRD_PARTY_DIR}/install/openvino/intel64) link_directories(${THIRD_PARTY_DIR}/install/tbb/lib) # oneDNN 库只有 libdnnl.so.3，需要用绝对路径 find_library(DNNL_LIB NAMES dnnl PATHS ${THIRD_PARTY_DIR}/install/onednn/lib NO_DEFAULT_PATH) if(NOT DNNL_LIB) set(DNNL_LIB "${THIRD_PARTY_DIR}/install/onednn/lib/libdnnl.so.3") endif() # 构建目标 add_executable(ocr_demo ocr_demo.cpp) # 使用 RPATH（非 RUNPATH），确保间接依赖（如OpenVINO）也能找到 .so set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags") set(ALL_RPATH "${PADDLE_LIB_DIR};${THIRD_PARTY_DIR}/install/mklml/lib;${THIRD_PARTY_DIR}/install/onednn/lib;${THIRD_PARTY_DIR}/install/openvino/intel64;${THIRD_PARTY_DIR}/install/tbb/lib") target_link_libraries(ocr_demo paddle_inference phi_core phi common ${DNNL_LIB} mklml_intel iomp5 tbb pthread dl rt m ) set_target_properties(ocr_demo PROPERTIES BUILD_RPATH "${ALL_RPATH}" INSTALL_RPATH "${ALL_RPATH}" )

几个坑提前说：

必须C++17：代码用了结构化绑定 auto [cx, cy] = ...，C++14编译不过
RPATH不能用RUNPATH：Paddle Inference间接依赖了OpenVINO的so，RUNPATH不传递给间接依赖，必须加 --disable-new-dtags 走RPATH
oneDNN没有libdnnl.so：只有libdnnl.so.3，直接-ldnnl链接失败，需要用绝对路径

四、核心代码 ocr_demo.cpp

完整代码约470行，实现了：图片加载 → 检测预处理 → DB文本检测 → DB后处理 → 裁剪文本区域 → 识别预处理 → CTC解码 → 输出结果

4.1 头文件和结构体

// PP-OCRv5 C++ 推理 Demo (检测 + 识别 全流程)// 无 OpenCV 依赖，使用 stb_image 加载图片#defineSTB_IMAGE_IMPLEMENTATION#include"stb_image.h"#defineSTB_IMAGE_RESIZE_IMPLEMENTATION#include"stb_image_resize2.h"#include<algorithm>#include<chrono>#include<cmath>#include<cstring>#include<fstream>#include<iostream>#include<numeric>#include<queue>#include<sstream>#include<string>#include<vector>#include"paddle/include/paddle_inference_api.h"structBox{int x0, y0, x1, y1;// 在原图上的坐标float score;};

4.2 加载字典

static std::vector<std::string>LoadDict(const std::string& path){ std::vector<std::string> dict; std::ifstream ifs(path, std::ios::in);if(!ifs.is_open()){ std::cerr <<"无法打开字典: "<< path << std::endl;return dict;} std::string line;while(std::getline(ifs, line)){if(!line.empty()&& line.back()=='\r') line.pop_back(); dict.push_back(line);} dict.push_back(" ");// CTC blank + spacereturn dict;}

4.3 检测预处理

PP-OCRv5检测模型输入要求：

BGR顺序
ImageNet归一化：mean=[0.485, 0.456, 0.406]，std=[0.229, 0.224, 0.225]
长边缩放到1280，宽高对齐到32的倍数

structDetInput{ std::vector<float> data;// CHW normalizedint resized_w, resized_h;int orig_w, orig_h;float ratio_w, ratio_h;};staticboolPrepareDetInput(constunsignedchar* img_rgb,int w,int h,int resize_long, DetInput& out){ out.orig_w = w; out.orig_h = h;float ratio =1.0f;if(std::max(w, h)> resize_long){ ratio =static_cast<float>(resize_long)/ std::max(w, h);}int new_w =static_cast<int>(w * ratio);int new_h =static_cast<int>(h * ratio); new_w = std::max(32,(new_w +31)/32*32); new_h = std::max(32,(new_h +31)/32*32); out.resized_w = new_w; out.resized_h = new_h; out.ratio_w =static_cast<float>(w)/ new_w; out.ratio_h =static_cast<float>(h)/ new_h; std::vector<unsignedchar>resized(new_w * new_h *3);stbir_resize_uint8_linear(img_rgb, w, h,0, resized.data(), new_w, new_h,0, STBIR_RGB);constfloat mean[3]={0.485f,0.456f,0.406f};constfloat std_[3]={0.229f,0.224f,0.225f}; out.data.resize(3* new_h * new_w);for(int c =0; c <3;++c){int src_c =2- c;// RGB→BGRfor(int y =0; y < new_h;++y){for(int x =0; x < new_w;++x){float pixel = resized[(y * new_w + x)*3+ src_c]/255.0f; pixel =(pixel - mean[c])/ std_[c]; out.data[c * new_h * new_w + y * new_w + x]= pixel;}}}returntrue;}

4.4 DB后处理

检测模型输出的是概率图（probability map），需要做后处理：二值化 → 连通组件标记 → 求bbox → unclip扩展

// BFS 连通组件标记staticvoidFindConnectedComponents(const std::vector<uint8_t>& binary,int w,int h, std::vector<int>& labels,int& num_labels){ labels.assign(w * h,0); num_labels =0;constint dx[]={1,-1,0,0};constint dy[]={0,0,1,-1};for(int y =0; y < h;++y){for(int x =0; x < w;++x){if(binary[y * w + x]==0|| labels[y * w + x]!=0)continue;++num_labels; std::queue<std::pair<int,int>> q; q.push({x, y}); labels[y * w + x]= num_labels;while(!q.empty()){auto[cx, cy]= q.front(); q.pop();for(int d =0; d <4;++d){int nx = cx + dx[d], ny = cy + dy[d];if(nx >=0&& nx < w && ny >=0&& ny < h && binary[ny * w + nx]!=0&& labels[ny * w + nx]==0){ labels[ny * w + nx]= num_labels; q.push({nx, ny});}}}}}}static std::vector<Box>DBPostProcess(constfloat* prob_map,int map_w,int map_h,float thresh,float box_thresh,float unclip_ratio,float ratio_w,float ratio_h,int orig_w,int orig_h){// 1) 二值化 std::vector<uint8_t>binary(map_w * map_h);for(int i =0; i < map_w * map_h;++i){ binary[i]=(prob_map[i]>= thresh)?1:0;}// 2) 连通组件 std::vector<int> labels;int num_labels =0;FindConnectedComponents(binary, map_w, map_h, labels, num_labels);// 3) 对每个组件求 bbox 和平均 score std::vector<Box> boxes;if(num_labels ==0)return boxes; std::vector<int>min_x(num_labels +1, map_w); std::vector<int>min_y(num_labels +1, map_h); std::vector<int>max_x(num_labels +1,0); std::vector<int>max_y(num_labels +1,0); std::vector<float>sum_score(num_labels +1,0); std::vector<int>count(num_labels +1,0);for(int y =0; y < map_h;++y){for(int x =0; x < map_w;++x){int l = labels[y * map_w + x];if(l ==0)continue; min_x[l]= std::min(min_x[l], x); min_y[l]= std::min(min_y[l], y); max_x[l]= std::max(max_x[l], x); max_y[l]= std::max(max_y[l], y); sum_score[l]+= prob_map[y * map_w + x]; count[l]++;}}for(int l =1; l <= num_labels;++l){if(count[l]<3)continue;float avg_score = sum_score[l]/ count[l];if(avg_score < box_thresh)continue;int bw = max_x[l]- min_x[l];int bh = max_y[l]- min_y[l];float expand_x = bw *(unclip_ratio -1.0f)*0.5f;float expand_y = bh *(unclip_ratio -1.0f)*0.5f; Box box; box.x0 = std::max(0,static_cast<int>((min_x[l]- expand_x)* ratio_w)); box.y0 = std::max(0,static_cast<int>((min_y[l]- expand_y)* ratio_h)); box.x1 = std::min(orig_w -1,static_cast<int>((max_x[l]+ expand_x)* ratio_w)); box.y1 = std::min(orig_h -1,static_cast<int>((max_y[l]+ expand_y)* ratio_h)); box.score = avg_score;if(box.x1 - box.x0 <3|| box.y1 - box.y0 <3)continue; boxes.push_back(box);} std::sort(boxes.begin(), boxes.end(),[](const Box& a,const Box& b){return a.y0 < b.y0;});return boxes;}

4.5 识别预处理

识别模型输入要求：

BGR顺序
归一化：mean=[0.5, 0.5, 0.5]，std=[0.5, 0.5, 0.5]
高度固定48，宽度按裁剪区域等比例缩放（最大320）
裁剪时上下左右加padding，提升边缘字符识别率

static std::vector<float>PrepareRecInput(constunsignedchar* img_rgb,int img_w,int img_h,const Box& box,int rec_h,int& rec_w){int crop_w = box.x1 - box.x0;int crop_h = box.y1 - box.y0;if(crop_w <=0|| crop_h <=0)return{};// 上下左右各扩展 paddingint pad = std::max(2, crop_h /4);int x0 = std::max(0, box.x0 - pad);int y0 = std::max(0, box.y0 - pad);int x1 = std::min(img_w, box.x1 + pad);int y1 = std::min(img_h, box.y1 + pad); crop_w = x1 - x0; crop_h = y1 - y0;if(crop_w <=0|| crop_h <=0)return{}; std::vector<unsignedchar>crop(crop_w * crop_h *3);for(int y =0; y < crop_h;++y){for(int x =0; x < crop_w;++x){int src_idx =((y0 + y)* img_w +(x0 + x))*3;int dst_idx =(y * crop_w + x)*3; crop[dst_idx +0]= img_rgb[src_idx +0]; crop[dst_idx +1]= img_rgb[src_idx +1]; crop[dst_idx +2]= img_rgb[src_idx +2];}}float wh_ratio =static_cast<float>(crop_w)/ crop_h;int target_w = std::min(rec_w, std::max(1,static_cast<int>(rec_h * wh_ratio))); std::vector<unsignedchar>resized(target_w * rec_h *3);stbir_resize_uint8_linear(crop.data(), crop_w, crop_h,0, resized.data(), target_w, rec_h,0, STBIR_RGB);constfloat mean[3]={0.5f,0.5f,0.5f};constfloat std_[3]={0.5f,0.5f,0.5f}; std::vector<float>data(3* rec_h * target_w);for(int c =0; c <3;++c){int src_c =2- c;// RGB→BGRfor(int y =0; y < rec_h;++y){for(int x =0; x < target_w;++x){float pixel = resized[(y * target_w + x)*3+ src_c]/255.0f; pixel =(pixel - mean[c])/ std_[c]; data[c * rec_h * target_w + y * target_w + x]= pixel;}}} rec_w = target_w;return data;}

4.6 CTC 解码

PP-OCRv5的识别模型输出已经是softmax之后的概率，不需要再做softmax（这是一个容易踩的坑，否则置信度全是0.00）：

static std::string CTCDecode(constfloat* output,int time_steps,int class_num,const std::vector<std::string>& dict,float& confidence){ std::string result;int prev =0;float total_conf =0.0f;int char_count =0;for(int t =0; t < time_steps;++t){constfloat* probs = output + t * class_num;int best_idx =0;float best_val = probs[0];for(int c =1; c < class_num;++c){if(probs[c]> best_val){ best_val = probs[c]; best_idx = c;}}if(best_idx !=0&& best_idx != prev){int dict_idx = best_idx -1;if(dict_idx >=0&& dict_idx <static_cast<int>(dict.size())){ result += dict[dict_idx];} total_conf += best_val;// 直接用概率，不要再softmax！ char_count++;} prev = best_idx;} confidence =(char_count >0)?(total_conf / char_count):0.0f;return result;}

4.7 主函数（串联全流程）

intmain(int argc,char* argv[]){ std::string image_path ="models/test.jpg"; std::string det_model ="models/PP-OCRv5_server_det_infer/inference.json"; std::string det_params ="models/PP-OCRv5_server_det_infer/inference.pdiparams"; std::string rec_model ="models/PP-OCRv5_server_rec_infer/inference.json"; std::string rec_params ="models/PP-OCRv5_server_rec_infer/inference.pdiparams"; std::string dict_path ="models/ppocr_v5_dict.txt";int num_threads =4;bool text_only =false;// 解析参数for(int i =1; i < argc;++i){if(std::string(argv[i])=="--text-only") text_only =true;} std::vector<std::string> pos_args;for(int i =1; i < argc;++i){if(std::string(argv[i]).substr(0,2)!="--") pos_args.push_back(argv[i]);}if(pos_args.size()>=1) image_path = pos_args[0];if(pos_args.size()>=2) dict_path = pos_args[1];if(pos_args.size()>=3) num_threads = std::atoi(pos_args[2].c_str()); std::cout <<" PP-OCRv5 C++ 推理 Demo"<< std::endl; std::cout <<" Paddle: "<< paddle_infer::GetVersion()<< std::endl;// ========== 加载图片 ==========int img_w, img_h, img_c;unsignedchar* img_rgb =stbi_load(image_path.c_str(),&img_w,&img_h,&img_c,3);if(!img_rgb){ std::cerr <<"无法加载图片: "<< image_path << std::endl;return-1;}// ========== 加载字典 ==========auto dict =LoadDict(dict_path);if(dict.empty())return-1;// ========== 1. 文本检测 ========== DetInput det_in;PrepareDetInput(img_rgb, img_w, img_h,1280, det_in); paddle_infer::Config config; config.SetModel(det_model, det_params); config.DisableGpu(); config.SetCpuMathLibraryNumThreads(num_threads); config.SwitchIrOptim(true); config.EnableMKLDNN(); config.EnableNewIR(true);// PP-OCRv5 PIR格式必须开启！auto predictor = paddle_infer::CreatePredictor(config);auto input_names = predictor->GetInputNames();auto input_t = predictor->GetInputHandle(input_names[0]); input_t->Reshape({1,3, det_in.resized_h, det_in.resized_w}); input_t->CopyFromCpu(det_in.data.data());auto t0 = std::chrono::high_resolution_clock::now(); predictor->Run();auto t1 = std::chrono::high_resolution_clock::now();double det_ms = std::chrono::duration<double, std::milli>(t1 - t0).count();auto output_names = predictor->GetOutputNames();auto output_t = predictor->GetOutputHandle(output_names[0]);auto shape = output_t->shape();int map_h = shape[2], map_w = shape[3]; std::vector<float>prob_map(map_h * map_w); output_t->CopyToCpu(prob_map.data());auto boxes =DBPostProcess(prob_map.data(), map_w, map_h,0.3f,0.6f,1.5f, det_in.ratio_w, det_in.ratio_h, img_w, img_h);// ========== 2. 文本识别 ========== paddle_infer::Config rec_config; rec_config.SetModel(rec_model, rec_params); rec_config.DisableGpu(); rec_config.SetCpuMathLibraryNumThreads(num_threads); rec_config.SwitchIrOptim(true); rec_config.EnableMKLDNN(); rec_config.EnableNewIR(true);auto rec_predictor = paddle_infer::CreatePredictor(rec_config);constint REC_H =48;constint REC_W =320;for(size_t i =0; i < boxes.size();++i){auto& box = boxes[i];int actual_w = REC_W;auto rec_data =PrepareRecInput(img_rgb, img_w, img_h, box, REC_H, actual_w);if(rec_data.empty())continue;auto rec_input_names = rec_predictor->GetInputNames();auto rec_input_t = rec_predictor->GetInputHandle(rec_input_names[0]); rec_input_t->Reshape({1,3, REC_H, actual_w}); rec_input_t->CopyFromCpu(rec_data.data()); rec_predictor->Run();auto rec_output_names = rec_predictor->GetOutputNames();auto rec_out_t = rec_predictor->GetOutputHandle(rec_output_names[0]);auto rec_shape = rec_out_t->shape();int time_steps = rec_shape[1];int class_num = rec_shape[2]; std::vector<float>rec_output(time_steps * class_num); rec_out_t->CopyToCpu(rec_output.data());float confidence =0.0f; std::string text =CTCDecode(rec_output.data(), time_steps, class_num, dict, confidence);if(!text.empty()){if(text_only){ std::cout << text << std::endl;}else{printf(" [%zu] (%d,%d)-(%d,%d) conf=%.2f \"%s\"\n", i +1, box.x0, box.y0, box.x1, box.y1, confidence, text.c_str());}}}stbi_image_free(img_rgb);return0;}

五、编译和运行

5.1 编译

mkdir build &&cd build cmake .. -DCMAKE_BUILD_TYPE=Release make -j$(nproc)

5.2 运行

# 回到项目根目录运行（模型路径是相对路径）cd..# 详细模式（显示坐标+置信度+文字） ./build/ocr_demo 你的图片.png # 纯文字模式（只输出识别文字） ./build/ocr_demo 你的图片.png --text-only

5.3 运行结果示例

 PP-OCRv5 C++ 推理 Demo Paddle: version: 3.0.0 图片: test.png (661x316) 字典: 18384 字符 [1/2] 文本检测... 检测到 18 个文本区域 (耗时: 665 ms) [2/2] 文本识别... [1] (0,93)-(289,103) conf=0.82 "131566总访问196原创248粉丝60关注" [2] (0,115)-(295,126) conf=0.86 "IP属地：陕西省加入ZEEKLOG时间：2020-01-31" [3] (0,136)-(108,148) conf=0.96 "查看详细资料" [4] (0,225)-(184,235) conf=0.93 "weixin_46244623 码龄6年" [5] (458,282)-(660,295) conf=0.99 "公众号·懒人程序" 检测耗时: 665 ms 识别耗时: 1453 ms (共 18 个区域) 总耗时: 2118 ms

六、踩坑总结

坑	现象	解法
C++标准	`auto [cx, cy]` 编译报错	CMakeLists.txt 设 `CMAKE_CXX_STANDARD 17`
RPATH vs RUNPATH	运行时 `libopenvino.so.2500 not found`	加 `-Wl,--disable-new-dtags`
oneDNN链接	`-ldnnl` 找不到	只有 `libdnnl.so.3`，用绝对路径
PIR格式模型	Predictor 创建失败	PP-OCRv5 用 `.json` 格式，必须 `EnableNewIR(true)`
置信度全是0.00	CTC解码置信度异常	模型输出已是softmax概率，不要再做softmax
边缘字符丢失	文字首尾被截断	裁剪时上下左右加padding
模型路径	`NotFoundError`	必须在项目根目录运行，不能在build/里运行

七、mobile vs server 怎么选？

	PP-OCRv5 mobile	PP-OCRv5 server
模型大小	小	大
CPU耗时	~600ms	~2100ms
识别精度	一般（小字容易错）	高（ZEEKLOG完整识别）
适合场景	移动端/实时场景	服务端/精度优先

想要速度快选mobile，想要识别准选server，按需切换模型路径即可，代码完全通用。

八、懒人包放在网盘了（仅支持linux）

链接: https://pan.baidu.com/s/19NZ7UeJzAoO7sXM9hWFVKA?pwd=hzei 提取码: hzei

创作不易记得点赞收藏加关注

更多内容请关注公众号懒人程序