llama.cpp 与 llama-server 安装部署指南

在 Ubuntu 22.04 系统上编译安装 llama.cpp 及 llama-server 的完整流程。包括系统依赖安装、源码克隆构建、GGUF 模型准备以及服务启动与接口测试。通过 curl 命令验证健康状态及对话接口，确保本地大语言模型服务正常运行。

神经兮兮发布于 2026/4/6更新于 2026/4/179 浏览

llama.cpp + llama-server 的安装部署验证

环境准备

Ubuntu 22.04.5 LTS (Jammy Jellyfish) —— 这是一个长期支持（LTS）且完全受支持的现代 Linux 发行版，非常适合部署 llama.cpp + llama-server。Ubuntu 22.04 自带较新的 GCC（11+）、CMake（3.22+）和 Python 3.10+，无需手动升级工具链，部署过程非常顺畅。

一、安装系统依赖

sudo apt update
sudo apt install -y git build-essential cmake libssl-dev ninja-build

二、克隆并编译 llama.cpp

1. 克隆仓库

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

2. 构建 server

使用 CMake 构建 server：

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_SERVER=ON
make -j $(nproc) llama-server

三、准备 GGUF 模型

下载量化模型（以 Llama-3-8B-Instruct Q6_K 为例），将模型文件放置到指定目录 /mnt/data/。

文章配图

四、启动服务

1. 前台启动示例

/mnt/workspace/llama.cpp/build/bin/llama-server -m /mnt/data/Llama-3-8B-Instruct-Coder.Q6_K.gguf --port 8080 --host 0.0.0.0 --ctx-size 2048 --threads 8

2. 后台启动

nohup /mnt/workspace/llama.cpp/build/bin/llama-server -m /mnt/data/Llama-3-8B-Instruct-Coder.Q6_K.gguf --port 8080 --host 0.0.0.0 --ctx-size 8192 --threads 8 > /mnt/workspace/llama-server.log 2>&1 &

五、验证与测试

1. 健康检查

curl http://localhost:8080/health

2. 查看日志

tail -f /mnt/workspace/llama-server.log

llama.cpp 与 llama-server 安装部署指南

llama.cpp + llama-server 的安装部署验证

环境准备

一、安装系统依赖

二、克隆并编译 llama.cpp

1. 克隆仓库

2. 构建 server

三、准备 GGUF 模型

四、启动服务

1. 前台启动示例

2. 后台启动

五、验证与测试

1. 健康检查

2. 查看日志

更多推荐文章

相关免费在线工具

3. 停止服务

4. 接口测试

llama.cpp 与 llama-server 安装部署指南

llama.cpp + llama-server 的安装部署验证

环境准备

一、安装系统依赖

二、克隆并编译 llama.cpp

1. 克隆仓库

2. 构建 server

三、准备 GGUF 模型

四、启动服务

1. 前台启动示例

2. 后台启动

五、验证与测试

1. 健康检查

2. 查看日志

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3. 停止服务

4. 接口测试