一、LLAMA-Factory 简介
LLaMA-Factory是一个简单易用且高效的大模型训练框架,支持上百种大模型的训练,框架特性主要包括:
基于 LLaMA-Factory 框架对 Qwen2.5 系列大模型进行微调的完整流程。涵盖环境安装、自定义数据集准备、全量微调、LoRA 微调及 QLoRA 微调的配置命令与参数说明。此外还包括模型权重合并方法及推理测试脚本示例,支持多 GPU 分布式训练及显存优化方案。

LLaMA-Factory是一个简单易用且高效的大模型训练框架,支持上百种大模型的训练,框架特性主要包括:
本文将介绍如何使用 LLaMA-Factory 对 Qwen2.5 系列大模型进行微调(Qwen1.5 系列模型也适用),更多特性请参考 https://github.com/hiyouga/LlamaFactory
LLaMA-Factory 的 github 地址为:https://github.com/hiyouga/LLaMA-Factory 。为防止项目更新带来软件版本不适配,我们下面安装一个历史版本。
source /etc/network_turbo
cd /root/autodl-tmp
git clone --depth 1 https://github.com/Jiangnanjiezi/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e "[torch,metrics]" -i https://mirrors.aliyun.com/pypi/simple/
(Empty line removed)
训练数据应保存为 json 文件,文件为:qwen_dataset.json。需要将其放到 autodl-tmp/LLaMA-Factory/data 下。
其内容示例如下:
[{"instruction": "请提取以下内容中的摘要信息","input": "保持身体健康的五个方法:\n\n1. 每天至少饮用 8 杯水,促进新陈代谢\n2. 每周进行 150 分钟中等强度运动,如快走或游泳\n3. 保证 7-9 小时高质量睡眠,避免熬夜\n4. 饮食中增加蔬菜水果比例,减少油炸食品\n5. 定期体检,监测血压、血糖等指标","output": "多喝水、规律运动、充足睡眠、均衡饮食、定期体检"},{"instruction": "请提取以下内容中的摘要信息","input": "提高学习效率的三个技巧:\n\n1. 使用番茄工作法,每 25 分钟专注后休息 5 分钟\n2. 建立思维导图整理知识框架\n3. 睡前复习重点内容加强记忆","output": "番茄工作法、思维导图、睡前复习"},{"instruction": "请提取以下内容中的摘要信息","input": "旅行必备物品清单:\n1. 护照/身份证原件及复印件\n2. 便携充电宝和转换插头\n3. 常用药品(退烧药、创可贴)\n4. 轻便折叠雨伞\n5. 分装洗漱用品","output": "证件、充电设备、药品、雨具、洗漱包"},{"instruction": "请提取以下内容中的摘要信息","input": "职场沟通四大原则:\n① 明确沟通目标\n② 使用金字塔表达结构\n③ 注意非语言信号(眼神/姿态)\n④ 及时确认信息理解度","output": "目标明确、结构化表达、非语言交流、信息确认"},]
在 LLaMA-Factory 文件夹下的 data/dataset_info.json 文件中注册自定义的训练数据,在文件中添加如下配置信息:
"qwen_dataset": {"file_name": "qwen_dataset.json"},
pip install modelscope
mkdir -p /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
cd /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 下载模型
# modelscope download --model Qwen/Qwen2.5-7B --local_dir ./
# 因为 7B 模型下载太慢,并且微调所占显存也大,所以用 1.8B 模型来演示
modelscope download --model Qwen/Qwen2.5-1.5B --local_dir ./
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-full-sft.yaml 配置文件,用于设置全量参数训练的配置。
### 模型配置
# 预训练模型的本地路径或 HuggingFace 模型 ID
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如 Qwen/ChatGLM 等)
trust_remote_code: true
### 方法配置
# 微调阶段:监督式微调 (Supervised Fine-Tuning)
stage: sft
# 是否执行训练阶段
do_train: true
# 微调类型:全参数微调(可选值:full/lora/qlora)
finetuning_type: full
# DeepSpeed 配置文件路径(使用 ZeRO Stage 3 优化策略)
deespeed: /root/autodl-tmp/LLaMA-Factory/examples/deepspeed/ds_z3_config.json
### 数据集配置
# 使用的数据集名称(需与 data 目录下的数据集名称对应)
dataset: qwen_dataset
# 使用的模板格式(与模型架构匹配)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的缓存文件(建议数据集修改后启用)
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为 CPU 核心数的 50-70%)
preprocessing_num_workers: 16
### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/full
# 每隔多少训练步记录一次日志
logging_steps: 10
# 每隔多少训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录(建议新训练时启用)
overwrite_output_dir: true
### 训练参数
# 每个 GPU 的批次大小(实际 batch_size = 此值 * gradient_accumulation_steps * GPU 数量)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大 batch_size)
gradient_accumulation_steps: 16
# 初始学习率(适合 7B 级别模型的典型值)
learning_rate: 1.0e-5
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前 10% 的 step 用于线性预热)
warmup_ratio: 0.1
# 启用 BF16 混合精度训练(需要 Ampere 架构以上 GPU)
bf16: true
# 分布式训练超时时间(单位:毫秒)
ddp_timeout: 180000000
# 约 50 小时
### 评估配置
# 验证集划分比例(从训练集划分)
val_size: 0.1
# 评估时每个 GPU 的批次大小
per_device_eval_batch_size: 1
# 评估策略(按训练步数间隔评估)
eval_strategy: steps
# 每隔多少训练步执行一次评估
eval_steps: 500
deepspeed 的配置:
{
// 全局训练批次大小(自动计算为:micro_batch * gpu_num * gradient_accumulation)
"train_batch_size": "auto",
// 单 GPU 的微批次大小(根据显存自动调整)
"train_micro_batch_size_per_gpu": "auto",
// 梯度累积步数(自动匹配 micro_batch 配置)
"gradient_accumulation_steps": "auto",
// 梯度裁剪阈值(自动禁用或设置默认 1.0)
"gradient_clipping": "auto",
// 允许未经官方测试的优化器(需谨慎开启)
"zero_allow_untested_optimizer": true,
// FP16 混合精度配置
"fp16": {
"enabled": "auto",
// 自动根据硬件兼容性启用
"loss_scale": 0,
// 动态损失缩放(0 表示自动调整)
"loss_scale_window": 1000,
// 缩放调整窗口大小(1000 次迭代)
"initial_scale_power": 16,
// 初始缩放比例 2^16
"hysteresis": 2,
// 缩放容差(防止频繁调整)
"min_loss_scale": 1
// 最小缩放比例
},
// BF16 混合精度配置(与 FP16 二选一)
"bf16": {
"enabled": "auto"
// 在支持 BF16 的 GPU 上自动启用
},
// ZeRO 优化策略(Stage3 完整配置)
"zero_optimization": {
"stage": 3,
// 最高优化等级(参数/梯度/优化器状态分片)
// 优化器状态卸载到 CPU
"offload_optimizer": {
"device": "cpu",
// 卸载到 CPU 内存
"pin_memory": true
// 使用锁页内存加速传输
},
// 模型参数卸载到 CPU
"offload_param": {
"device": "cpu",
// 参数存储到 CPU 内存
"pin_memory": true
// 使用 DMA 加速数据传输
},
"overlap_comm": false,
// 禁用通信计算重叠(提升稳定性)
"contiguous_gradients": true,
// 保持梯度内存连续(优化显存)
// 参数分组配置
"sub_group_size": 1e9,
// 单参数组最大尺寸(默认 1B 防止分组)
// 通信缓冲区自动调整
"reduce_bucket_size": "auto",
// AllReduce 缓冲区大小
"stage3_prefetch_bucket_size": "auto",
// 参数预取缓冲区
// 参数持久化阈值
"stage3_param_persistence_threshold": "auto",
// 参数驻留 GPU 的阈值
"stage3_max_live_parameters": 1e9,
// 最大驻留参数数量
"stage3_max_reuse_distance": 1e9,
// 参数重用距离阈值
// 模型保存时收集 16 位权重
"stage3_gather_16bit_weights_on_model_save": true
}
}
开始训练: 切换到 qwen2.5-7b-full-sft.yaml 所在的路径,执行下面的命令。
# 强制使用 torchrun 进行分布式训练初始化(适用于多 GPU/TPU 环境)
# 环境变量说明:
# - FORCE_TORCHRUN=1 : 强制使用 PyTorch 的 torchrun 命令来启动分布式训练
# (当自动检测失败或需要显式控制分布式训练时使用)
# (需确保已正确安装 torch>=1.8.0)
# 执行 LLaMA Factory 训练流程
# 命令结构:
# llamafactory-cli : 主程序入口(基于 Python Fire 的 CLI 工具)
# train : 子命令,指定执行训练任务
# qwen2.5-7b-full-sft.yaml : 训练配置文件路径(包含模型/数据/训练参数)
FORCE_TORCHRUN=1 llamafactory-cli train qwen2.5-7b-full-sft.yaml
训练结果:
[INFO|trainer.py:2519] 2025-11-15 00:36:01,373 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:36:01,373 >> Num examples = 54 [INFO|trainer.py:2521] 2025-11-15 00:36:01,373 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:36:01,373 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:36:01,373 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:36:01,373 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:36:01,373 >> Total optimization steps = 2 [INFO|trainer.py:2528] 2025-11-15 00:36:01,374 >> Number of trainable parameters = 1,543,714,304 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:24<00:00, 11.65s/it][INFO|trainer.py:4309] 2025-11-15 00:36:28,898 >> Saving model checkpoint to saves/qwen2.5-7b/full/checkpoint-2 [INFO|configuration_utils.py:491] 2025-11-15 00:36:28,901 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:28,902 >> Configuration saved in saves/qwen2.5-7b/full/checkpoint-2/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:33,246 >> Model weights saved in saves/qwen2.5-7b/full/checkpoint-2/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:33,247 >> chat template saved in saves/qwen2.5-7b/full/checkpoint-2/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:33,248 >> tokenizer config file saved in saves/qwen2.5-7b/full/checkpoint-2/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:33,248 >> Special tokens file saved in saves/qwen2.5-7b/full/checkpoint-2/special_tokens_map.json [2025-11-15 00:36:33,422][INFO][logging.py:107:log_dist][Rank 0][Torch] Checkpoint global_step2 is about to be saved! [2025-11-15 00:36:33,428][INFO][logging.py:107:log_dist][Rank 0] Saving model checkpoint: saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt [2025-11-15 00:36:33,428][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt...[2025-11-15 00:36:33,438][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/zero_pp_rank_0_mp_rank_00_model_states.pt.[2025-11-15 00:36:33,439][INFO][torch_checkpoint_engine.py:21:save][Torch] Saving saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...[2025-11-15 00:36:47,668][INFO][torch_checkpoint_engine.py:23:save][Torch] Saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.[2025-11-15 00:36:47,669][INFO][engine.py:3701:_save_zero_checkpoint] zero checkpoint saved saves/qwen2.5-7b/full/checkpoint-2/global_step2/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2025-11-15 00:36:47,673][INFO][torch_checkpoint_engine.py:33:commit][Torch] Checkpoint global_step2 is ready now! [INFO|trainer.py:2810] 2025-11-15 00:36:47,675 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 46.3009,'train_samples_per_second': 1.166,'train_steps_per_second': 0.043,'train_loss': 3.873927593231201,'epoch': 1.0} 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.15s/it][INFO|trainer.py:4309] 2025-11-15 00:36:49,886 >> Saving model checkpoint to saves/qwen2.5-7b/full [INFO|configuration_utils.py:491] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/config.json [INFO|configuration_utils.py:757] 2025-11-15 00:36:49,889 >> Configuration saved in saves/qwen2.5-7b/full/generation_config.json [INFO|modeling_utils.py:4181] 2025-11-15 00:36:52,910 >> Model weights saved in saves/qwen2.5-7b/full/model.safetensors [INFO|tokenization_utils_base.py:2421] 2025-11-15 00:36:52,910 >> chat template saved in saves/qwen2.5-7b/full/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:36:52,911 >> tokenizer config file saved in saves/qwen2.5-7b/full/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:36:52,911 >> Special tokens file saved in saves/qwen2.5-7b/full/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 4GF train_loss = 3.8739 train_runtime = 0:00:46.30 train_samples_per_second = 1.166 train_steps_per_second = 0.043 [WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:36:53] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:36:53,090 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:36:53,090 >> Num examples = 7 [INFO|trainer.py:4648] 2025-11-15 00:36:53,090 >> Batch size = 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 9.30it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 3.5243 eval_runtime = 0:00:00.71 eval_samples_per_second = 9.774 eval_steps_per_second = 5.585 [INFO|modelcard.py:456] 2025-11-15 00:36:53,806 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-lora-sft.yaml 配置文件,用于设置 lora 微调的配置。
### 模型配置
# 预训练模型的本地路径或 HuggingFace 模型 ID
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如 Qwen/ChatGLM 等)
trust_remote_code: true
### 训练方法
# 训练阶段:监督式微调(Supervised Fine-Tuning)
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型:LoRA(低秩适配)
finetuning_type: lora
# LoRA 作用的目标层(all 表示所有线性层)
lora_target: all
# LoRA 的秩(矩阵分解维度)
lora_rank: 16
# LoRA 的α值(缩放因子,通常等于 rank)
lora_alpha: 16
# LoRA 层的 dropout 率(防止过拟合)
lora_dropout: 0.05
### 数据集配置
# 使用的数据集名称(对应 data 目录下的数据集文件夹)
dataset: alpaca_zh_demo
# 使用的模板格式(需与模型匹配,如 qwen/llama/chatglm)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的预处理缓存
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为 CPU 核心数的 50-70%)
preprocessing_num_workers: 16
### 输出配置
# 模型和日志的输出目录
output_dir: saves/qwen2.5-7b/lora/sft
# 每隔 100 训练步记录一次日志
logging_steps: 100
# 每隔 100 训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图
plot_loss: true
# 是否覆盖已有输出目录(新训练时建议开启)
overwrite_output_dir: true
### 训练参数
# 每个 GPU 的批次大小(实际总 batch_size = 此值 * gradient_accumulation_steps * GPU 数)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大 batch_size,此处等效总 batch_size=16*GPU 数)
gradient_accumulation_steps: 16
# 初始学习率(LoRA 微调的典型学习率范围:1e-4 ~ 5e-4)
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前 10% 的 step 用于线性预热)
warmup_ratio: 0.1
# 启用 BF16 混合精度(需 Ampere 架构以上 GPU,如 A100/3090)
bf16: true
# 分布式训练超时时间(单位:毫秒,此处约 50 小时)
ddp_timeout: 180000000
### 评估配置
# 验证集划分比例(从训练集划分 10% 作为验证集)
val_size: 0.1
# 评估时每个 GPU 的批次大小
per_device_eval_batch_size: 1
# 评估策略:按训练步数间隔评估
eval_strategy: steps
# 每隔 500 训练步执行一次验证
eval_steps: 500
开始训练:
# llamafactory-cli : 主程序入口
# train : 子命令,指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML 格式的配置文件路径(包含完整的训练参数)
llamafactory-cli train qwen2.5-7b-lora-sft.yaml
训练结果为:
[INFO|trainer.py:2519] 2025-11-15 00:39:43,504 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:39:43,504 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:39:43,504 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:39:43,504 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:39:43,504 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:39:43,504 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:39:43,504 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:39:43,507 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 1.99s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,102 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,121 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,122 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,220 >> chat template saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,220 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,221 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:41:02,515 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 79.008,'train_samples_per_second': 11.391,'train_steps_per_second': 0.367,'train_loss': 1.657024120462352,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:18<00:00, 2.72s/it][INFO|trainer.py:4309] 2025-11-15 00:41:02,518 >> Saving model checkpoint to saves/qwen2.5-7b/lora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:41:02,537 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:41:02,538 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:41:02,611 >> chat template saved in saves/qwen2.5-7b/lora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:41:02,611 >> tokenizer config file saved in saves/qwen2.5-7b/lora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:41:02,611 >> Special tokens file saved in saves/qwen2.5-7b/lora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.657 train_runtime = 0:01:19.00 train_samples_per_second = 11.391 train_steps_per_second = 0.367 [WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:41:02] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:41:02,752 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:41:02,752 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:41:02,752 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.55it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6728 eval_runtime = 0:00:01.62 eval_samples_per_second = 61.354 eval_steps_per_second = 30.677 [INFO|modelcard.py:456] 2025-11-15 00:41:04,381 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
在 LLaMA-Factory 文件夹下,创建 qwen2.5-7b-qlora-sft.yaml 配置文件,用于设置 qlora 微调的配置。
### 模型配置
# 预训练模型的本地路径或 HuggingFace 模型 ID(需确保路径正确)
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
# 必须开启以加载包含自定义代码的模型(如 Qwen/ChatGLM 等)
trust_remote_code: true
### 训练方法
# 训练阶段:监督式微调(Supervised Fine-Tuning)
stage: sft
# 是否启用训练模式
do_train: true
# 微调类型:QLoRA(量化低秩适配)
finetuning_type: lora
# QLoRA 作用的目标层(all 表示所有线性层)
lora_target: all
# 量化位数(4-bit 量化)
quantization_bit: 4
# 量化方法(使用 bitsandbytes 库实现)
quantization_method: bitsandbytes
# QLoRA 的秩(矩阵分解维度)
lora_rank: 16
# QLoRA 的α值(缩放因子,通常等于 rank)
lora_alpha: 16
# QLoRA 层的 dropout 率(防止过拟合)
lora_dropout: 0.05
### 数据集配置
# 使用的数据集名称(对应 data 目录下的数据集文件夹)
dataset: alpaca_zh_demo
# 使用的模板格式(需与模型架构匹配)
template: qwen
# 输入序列最大长度(单位:token)
cutoff_len: 1024
# 是否覆盖已有的预处理缓存(数据集修改后需启用)
overwrite_cache: true
# 数据预处理的并行进程数(建议设置为 CPU 核心数的 50-70%)
preprocessing_num_workers: 16
### 输出配置
# 模型和日志的输出目录(QLoRA 检查点保存路径)
output_dir: saves/qwen2.5-7b/qlora/sft
# 每隔 100 训练步记录一次日志
logging_steps: 100
# 每隔 100 训练步保存一次模型
save_steps: 100
# 是否生成训练损失曲线图(保存在 output_dir/loss.png)
plot_loss: true
# 是否覆盖已有输出目录(新训练时建议开启)
overwrite_output_dir: true
### 训练参数
# 每个 GPU 的批次大小(实际总 batch_size = 此值 * gradient_accumulation_steps * GPU 数)
per_device_train_batch_size: 1
# 梯度累积步数(用于模拟更大 batch_size,此处等效总 batch_size=16*GPU 数)
gradient_accumulation_steps: 16
# 初始学习率(QLoRA 典型学习率范围:1e-4 ~ 5e-4)
learning_rate: 1.0e-4
# 训练总轮数
num_train_epochs: 1.0
# 学习率调度策略(余弦退火)
lr_scheduler_type: cosine
# 学习率预热比例(前 10% 的 step 用于线性预热)
warmup_ratio: 0.1
# 启用 BF16 混合精度(需 Ampere 架构以上 GPU,如 A100/3090)
bf16: true
# 分布式训练超时时间(单位:毫秒,此处约 50 小时)
ddp_timeout: 180000000
### 评估配置
# 验证集划分比例(从训练集划分 10% 作为验证集)
val_size: 0.1
# 评估时每个 GPU 的批次大小
per_device_eval_batch_size: 1
# 评估策略:按训练步数间隔评估
eval_strategy: steps
# 每隔 500 训练步执行一次验证
eval_steps: 500
QLoRA 训练:
# llamafactory-cli : 主程序入口
# train : 子命令,指定执行训练任务
# qwen2.5-7b-lora-sft.yaml : YAML 格式的配置文件路径(包含完整的训练参数)
llamafactory-cli train qwen2.5-7b-qlora-sft.yaml
训练结果如下:
[INFO|trainer.py:2519] 2025-11-15 00:43:46,249 >> ***** Running training *****[INFO|trainer.py:2520] 2025-11-15 00:43:46,249 >> Num examples = 900 [INFO|trainer.py:2521] 2025-11-15 00:43:46,249 >> Num Epochs = 1 [INFO|trainer.py:2522] 2025-11-15 00:43:46,249 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2525] 2025-11-15 00:43:46,249 >> Total train batch size (w.parallel, distributed & accumulation) = 32 [INFO|trainer.py:2526] 2025-11-15 00:43:46,249 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2527] 2025-11-15 00:43:46,249 >> Total optimization steps = 29 [INFO|trainer.py:2528] 2025-11-15 00:43:46,254 >> Number of trainable parameters = 18,464,768 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 1.98s/it][INFO|trainer.py:4309] 2025-11-15 00:45:06,653 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft/checkpoint-29 [INFO|configuration_utils.py:763] 2025-11-15 00:45:06,673 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:06,674 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:06,761 >> chat template saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:06,761 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:06,761 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/checkpoint-29/special_tokens_map.json [INFO|trainer.py:2810] 2025-11-15 00:45:07,051 >> Training completed.Do not forget to share your model on huggingface.co/models =){'train_runtime': 80.7972,'train_samples_per_second': 11.139,'train_steps_per_second': 0.359,'train_loss': 1.6571868370319236,'epoch': 1.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [01:20<00:00, 2.78s/it][INFO|trainer.py:4309] 2025-11-15 00:45:07,054 >> Saving model checkpoint to saves/qwen2.5-7b/qlora/sft [INFO|configuration_utils.py:763] 2025-11-15 00:45:07,073 >> loading configuration file /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B/config.json [INFO|configuration_utils.py:839] 2025-11-15 00:45:07,074 >> Model config Qwen2Config {"architectures": ["Qwen2ForCausalLM"],"attention_dropout": 0.0,"bos_token_id": 151643,"dtype": "bfloat16","eos_token_id": 151643,"hidden_act": "silu","hidden_size": 1536,"initializer_range": 0.02,"intermediate_size": 8960,"layer_types": ["full_attention","full_attention",...],"max_position_embeddings": 131072,"max_window_layers": 28,"model_type": "qwen2","num_attention_heads": 12,"num_hidden_layers": 28,"num_key_value_heads": 2,"rms_norm_eps": 1e-06,"rope_scaling": null,"rope_theta": 1000000.0,"sliding_window": null,"tie_word_embeddings": true,"transformers_version": "4.57.1","use_cache": true,"use_mrope": false,"use_sliding_window": false,"vocab_size": 151936 }[INFO|tokenization_utils_base.py:2421] 2025-11-15 00:45:07,136 >> chat template saved in saves/qwen2.5-7b/qlora/sft/chat_template.jinja [INFO|tokenization_utils_base.py:2590] 2025-11-15 00:45:07,136 >> tokenizer config file saved in saves/qwen2.5-7b/qlora/sft/tokenizer_config.json [INFO|tokenization_utils_base.py:2599] 2025-11-15 00:45:07,137 >> Special tokens file saved in saves/qwen2.5-7b/qlora/sft/special_tokens_map.json ***** train metrics ***** epoch = 1.0 total_flos = 1054627GF train_loss = 1.6572 train_runtime = 0:01:20.79 train_samples_per_second = 11.139 train_steps_per_second = 0.359 [WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_loss to plot.[WARNING|2025-11-15 00:45:07] llamafactory.extras.ploting:148 >> No metric eval_accuracy to plot.[INFO|trainer.py:4643] 2025-11-15 00:45:07,276 >> ***** Running Evaluation *****[INFO|trainer.py:4645] 2025-11-15 00:45:07,277 >> Num examples = 100 [INFO|trainer.py:4648] 2025-11-15 00:45:07,277 >> Batch size = 1 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 31.85it/s]***** eval metrics ***** epoch = 1.0 eval_loss = 1.6738 eval_runtime = 0:00:01.61 eval_samples_per_second = 61.919 eval_steps_per_second = 30.96 [INFO|modelcard.py:456] 2025-11-15 00:45:08,890 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling','type': 'text-generation'}}
使用上述训练配置,各个方法实测的显存占用如下。训练中的显存占用与训练参数配置息息相关,可根据自身实际需求进行设置。
如果采用 LoRA 或者 QLoRA 进行训练,脚本只保存对应的 LoRA 权重,需要合并权重才能进行推理。全量参数训练无需执行此步骤。下面将 LoRA 微调的权重和预训练模型进行合并。注意:如果是 QLoRA 微调的权重需要和使用 NF4 方式量化后的预训练模型进行合并。
微调的命令如下:
llamafactory-cli export qwen2.5-7b-merge-lora.yaml
其中 qwen2.5-7b-merge-lora.yaml 中配置如下:
### model
model_name_or_path: /root/autodl-tmp/LLaMA-Factory/models/Qwen2.5-7B
adapter_name_or_path: /root/autodl-tmp/LLaMA-Factory/saves/qwen2.5-7b/lora/sft
template: qwen
finetuning_type: lora
trust_remote_code: true
// 必须开启
### export
export_dir: /root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged
export_size: 2
export_device: cpu
export_legacy_format: false
权重合并的部分参数说明:
| 参数 | 说明 |
|---|---|
| model_name_or_path | 预训练模型的名称或路径 |
| template | 模型类型 |
| export_dir | 导出路径 |
| export_size | 最大导出模型文件大小 |
| export_device | 导出设备 |
| export_legacy_format | 是否使用旧格式导出 |
注意:
inference.py 文件内容如下:
import time
from transformers import AutoModelForCausalLM, AutoTokenizer
# 加载 tokenizer 和 model
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", trust_remote_code=True )
model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/LLaMA-Factory/models/qwen2.5-7b-sft-lora-merged", device_map="auto", trust_remote_code=True ).eval()
prompt = "你好"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)# 记录生成开始时间
start_time = time.time()# 使用 generate 生成文本
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.3, top_p=0.4 )# 记录生成结束时间
end_time = time.time()# 解码输出
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("生成结果:", response)# 统计生成速度
num_generated_tokens = outputs.shape[1]- inputs['input_ids'].shape[1]# 新生成的 token 数量
elapsed_time = end_time - start_time
tokens_per_second = num_generated_tokens / elapsed_time if elapsed_time > 0 else 0
print(f"生成了 {num_generated_tokens} 个 token,用时 {elapsed_time:.2f} 秒,速度约为 {tokens_per_second:.2f} token/s")
结果如下:
生成结果:你好,我有一个问题想问。您好,请问有什么问题需要帮助吗?我最近感到很焦虑,有什么方法可以缓解吗?焦虑是一种常见的心理问题,您可以尝试进行深呼吸、冥想、运动、与朋友聊天等方式来缓解焦虑。同时,也可以考虑寻求专业心理咨询师的帮助。生成了 64 个 token,用时 2.17 秒,速度约为 29.46 token/s

微信公众号「极客日志」,在微信中扫描左侧二维码关注。展示文案:极客日志 zeeklog
使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online