腾讯 HunyuanOCR 1B 模型本地部署与测试指南

介绍腾讯混元 HunyuanOCR 1B 模型的本地部署流程。涵盖环境准备（Ubuntu, CUDA, Python）、依赖安装（vLLM, PyTorch）、服务启动及客户端接入测试。解决了 vLLM 编译报错及 CUDA 环境变量冲突问题，提供了官方提示词参考及 Python 代码接入示例。

月光旅人发布于 2026/3/27更新于 2026/4/174 浏览

简介

HunyuanOCR 是由腾讯开发的轻量级端到端 OCR 专家视觉语言模型 (VLM)，基于 Hunyuan 的原生多模态架构。该模型仅包含 1B 参数，却在多个行业基准测试中达到了最先进的水平，适用于复杂多语言文档解析、文本定位、开放字段信息提取、视频字幕提取和照片翻译等任务。

部署环境

官方环境要求：

🖥️ 操作系统：Linux
🐍 Python 版本：3.12+（推荐）
⚡ CUDA 版本：12.9
🔥 PyTorch 版本：2.7.1
🎮 GPU：支持 CUDA 的 NVIDIA 显卡
🧠 GPU 显存：20GB (for vLLM)
💾 磁盘空间：6GB

实际环境：

环境	版本
ubuntu-24.04.3 Server	release 10.0
Cuda	12.8
显卡 RTX 2080 Ti 22G	驱动 NVIDIA-Linux-x86_64-580.105.08
uv	0.9.13
内存	32G

下载

这里选择从 modelscope 进行下载。

pip install modelscope
modelscope download --model Tencent-Hunyuan/HunyuanOCR --cache_dir '/home/qy/models/'

文章配图

uv 环境

# 安装 UV
curl -LsSf https://astral.sh/uv/install.sh | sh
# 查看 python 版本
uv python list
# 创建虚拟环境，并指定 python 版本
uv venv hunyuanocr --python 3.12
cd hunyuanocr
# 激活环境，激活后，括号中显示 hunyuanocr 表示已经切了环境
source hunyuanocr/bin/activate
# 配置 PyPI 仓库为国内源
vim ~/.config/uv/uv.toml
[registries.pypi]
index = "https://mirrors.aliyun.com/pypi/simple/"
default = true

下载推理源码

git clone https://github.com/Tencent-Hunyuan/HunyuanOCR.git
cd /home/qy/hunyuan/HunyuanOCR-main
uv pip install -r requirements.txt

安装 vLLM

uv pip install -U "aiohttp<4"
uv pip install -U vllm --extra-index-url https://wheels.vllm.ai/nightly

任务	中文提示词	英文提示词
文字检测识别	检测并识别图片中的文字，将文本坐标格式化输出。	Detect and recognize text in the image, and output the text coordinates in a formatted manner.
文档解析	• 识别图片中的公式，用 LaTeX 格式表示。 • 把图中的表格解析为 HTML。 • 解析图中的图表，对于流程图使用 Mermaid 格式表示，其他图表使用 Markdown 格式表示。 • 提取文档图片中正文的所有信息用 markdown 格式表示，其中页眉、页脚部分忽略，表格用 html 格式表达，文档中公式用 latex 格式表示，按照阅读顺序组织进行解析。	• Identify the formula in the image and represent it using LaTeX format. • Parse the table in the image into HTML. • Parse the chart in the image; use Mermaid format for flowcharts and Markdown for other charts. • Extract all information from the main body of the document image and represent it in markdown format, ignoring headers and footers. Tables should be expressed in HTML format, formulas in the document should be represented using LaTeX format, and the parsing should be organized according to the reading order.
通用文字提取	• 提取图中的文字。	• Extract the text in the image.
信息抽取	• 输出 Key 的值。 • 提取图片中的：['key1','key2', …] 的字段内容，并按照 JSON 格式返回。 • 提取图片中的字幕。	• Output the value of Key. • Extract the content of the fields: ['key1','key2', …] from the image and return it in JSON format. • Extract the subtitles from the image.
翻译	先提取文字，再将文字内容翻译为英文。若是文档，则其中页眉、页脚忽略。公式用 latex 格式表示，表格用 html 格式表示。	First extract the text, then translate the text content into English. If it is a document, ignore the header and footer. Formulas should be represented in LaTeX format, and tables should be represented in HTML format.

腾讯 HunyuanOCR 1B 模型本地部署与测试指南

简介

部署环境

下载

uv 环境

下载推理源码

安装 vLLM

更多推荐文章

相关免费在线工具

安装 cuda-compat

检查环境

启动 vLLM 服务

客户端接入

10.1 Cherry Studio 接入测试

10.2 python 代码接入

腾讯 HunyuanOCR 1B 模型本地部署与测试指南

简介

部署环境

下载

uv 环境

下载推理源码

安装 vLLM

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

安装 cuda-compat

检查环境

启动 vLLM 服务

客户端接入

10.1 Cherry Studio 接入测试

10.2 python 代码接入