AI 自动写商品文案!LLaMA Vision 微调全流程实战

AI 自动写商品文案!LLaMA Vision 微调全流程实战

目录

1.环境配置

2.加载Llama 3.2 Vision 模型

3.配置 LoRA 微调模块

4.数据加载

5.数据预处理:构造对话式训练样本

6.模型微调前测试

7.模型微调

8.微调后效果对比


在本项目中,我们将详细讲解如何使用 Meta 发布的 Llama 3.2 Vision 模型,结合开源数据集和高效微调框架 Unsloth,构建一个 商品图像 → 文本描述 的智能系统。项目重点探索了如何微调多模态大模型以适配具体任务,解决图像内容向自然语言转换的关键技术挑战。

1.环境配置

首先打开终端,通过以下命令安装环境依赖:

pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu121

pip install triton==3.0.0

pip install "unsloth[torch]" --upgrade

2.加载Llama 3.2 Vision 模型

LLaMA 3.2 Vision 是 Meta 在 LLaMA 3.1 基础上扩展推出的多模态大模型,具备同时处理图像与文本的能力。该模型通过引入视觉编码器和跨模态注意力机制,使其能够理解图像内容并生成自然语言描述,广泛应用于图像描述、图文问答、文档分析和视觉定位等任务。LLaMA 3.2 Vision 提供 11B 和 90B 两种参数规模,在多个行业标准的图文任务上表现优异,是当前开源社区中领先的视觉语言模型之一。在此项目中,我们将加载Unsloth 提供的 Llama-3.2-11B-Vision-Instruct 模型,该版本针对微调和推理进行了优化。为了减少内存使用和计算需求,我们将以 4 位量化方式加载模型。

from unsloth import FastVisionModel

import torch



# 本地模型文件夹的路径,根据实际情况修改

local_model_path = "/model-202507/Llama-3.2-11B-Vision-Instruct"



model, tokenizer = FastVisionModel.from_pretrained(

local_model_path,

load_in_4bit=True,

use_gradient_checkpointing="unsloth"

)🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. /opt/conda/envs/unsloth-py311/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm 🦥 Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.7.1: Fast Mllama patching. Transformers: 4.53.1. \\ /| NVIDIA GeForce RTX 4090. Num GPUs = 1. Max memory: 23.546 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0 \ / Bfloat16 = TRUE. FA [Xformers = 0.0.30. FA2 = False] "-____-" Free license: http://github.com/unslothai/unslothUnsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Loading checkpoint shards: 100%|██████████| 5/5 [00:42<00:00, 8.59s/it]

3.配置 LoRA 微调模块

为了使用 LoRA 训练模型,我们将重点选择和微调特定组件,如视觉层、语言层、注意力模块和 MLP 模块。这使我们能够以最小的架构更改来调整模型以适应特定任务。

model = FastVisionModel.get_peft_model(

model,

finetune_vision_layers = True,

finetune_language_layers = True,

finetune_attention_modules = True,

finetune_mlp_modules = True,

r = 16,

lora_alpha = 16,

lora_dropout = 0,

bias = "none",

random_state = 3443,

use_rslora = False,

loftq_config = None,

)Unsloth: Making `model.base_model.model.model.vision_model.transformer` require gradients

4.数据加载

从本地加载amazon-product-descriptions-vlm数据集,并选择前 500 个样本。该数据集包含大量亚马逊商品图片及其对应描述,任务目标是构建一个能够根据商品图片生成描述的系统,帮助电商平台提升用户体验与运营效率。

from datasets import load_dataset

dataset = load_dataset("/Dataset/amazon-product-descriptions-vlm/",

split = "train[0:500]")

datasetGenerating train split: 100%|██████████| 1345/1345 [00:00<00:00, 9151.79 examples/s] Dataset({ features: ['image', 'Uniq Id', 'Product Name', 'Category', 'Selling Price', 'Model Number', 'About Product', 'Product Specification', 'Technical Details', 'Shipping Weight', 'Variants', 'Product Url', 'Is Amazon Seller', 'description'], num_rows: 500 })

查看数据集中某个产品的图像和对应的描述示例:

dataset[100]["image"]





dataset[100]["description"]'Unleash the power of the iconic Hot Wheels Monster Trucks Twin Mill! This 1:24 scale die-cast vehicle features incredible detail and is perfect for thrilling stunts and imaginative play. Collect them all!'

5.数据预处理:构造对话式训练样本

在使用多模态大模型进行指令微调(Instruction Tuning)时,我们通常需要将图像与文本结合成一种模型可以理解的对话格式输入,以模拟人类与 AI 的交互场景。以下代码将原始的图像 + 商品描述样本转换为类似 OpenAI 对话结构的格式:

instruction = """

You are an expert Amazon worker who is good at writing product descriptions.

Write the product description accurately by looking at the image.

"""



             

def convert_to_conversation(sample):

conversation = [

{

"role": "user",

"content": [

{"type": "text", "text": instruction},

{"type": "image", "image": sample["image"]},

],

},

{

"role": "assistant",

"content": [{"type": "text", "text": sample["description"]}],

},

]

return {"messages": conversation}





pass





converted_dataset = [convert_to_conversation(sample) for sample in dataset]
converted_dataset[100]{'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': '\nYou are an expert Amazon worker who is good at writing product descriptions. \nWrite the product description accurately by looking at the image.\n'}, {'type': 'image', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x500>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'Unleash the power of the iconic Hot Wheels Monster Trucks Twin Mill! This 1:24 scale die-cast vehicle features incredible detail and is perfect for thrilling stunts and imaginative play. Collect them all!'}]}]}

6.模型微调前测试

我们将从数据集中选择第100个样本并对其进行推理,以评估其微调前开箱即用地编写产品描述的能力。

FastVisionModel.for_inference(model)



image = dataset[100]["image"]



messages = [

{

"role": "user",

"content": [

{"type": "image"},

{"type": "text", "text": instruction},

],

}

]

input_text = tokenizer.apply_chat_template(

messages, add_generation_prompt=True

)

inputs = tokenizer(

image,

input_text,

add_special_tokens=False,

return_tensors="pt",

).to("cuda")



from transformers import TextStreamer



text_streamer = TextStreamer(tokenizer, skip_prompt=True)

_ = model.generate(

**inputs,

streamer=text_streamer,

max_new_tokens=128,

use_cache=True,

temperature=1.5,

min_p=0.1

)The image depicts a toy monster truck, showcasing its vibrant orange body with silver accents. The vehicle features three large, black tires with orange rims, giving it a rugged and off-road-ready appearance. In addition to its impressive wheel arrangement, the truck is equipped with two silver engines visible under the hood, suggesting a dynamic and powerful design. A skull decal on the hood adds a touch of edginess, while a yellow and black logo on the front adds a pop of color. The toy truck appears to be part of a "Hot Wheels" series, as suggested by the "Hot Wheels" logo on the side. The overall design

描述虽然较长,但结构松散,出现了品牌冗余词、无关图案等现象,需要进一步优化。

7.模型微调

将模型设置为训练模式,并初始化一个监督微调(SFT)训练器,以准备在自定义数据整理器、数据集和优化的训练配置上训练视觉模型,以实现高效微调。

from unsloth import is_bf16_supported

from unsloth.trainer import UnslothVisionDataCollator

from trl import SFTTrainer, SFTConfig



FastVisionModel.for_training(model)



trainer = SFTTrainer(

model=model,

tokenizer=tokenizer,

data_collator=UnslothVisionDataCollator(model, tokenizer),

train_dataset=converted_dataset,

args=SFTConfig(

per_device_train_batch_size=2,

gradient_accumulation_steps=4,

warmup_steps=5,

max_steps=30,

learning_rate=2e-4,

fp16=not is_bf16_supported(),

bf16=is_bf16_supported(),

logging_steps=5,

optim="adamw_8bit",

weight_decay=0.01,

lr_scheduler_type="linear",

seed=3407,

output_dir="outputs",

report_to="none",

remove_unused_columns=False,

dataset_text_field="",

dataset_kwargs={"skip_prepare_dataset": True},

dataset_num_proc=4,

max_seq_length=2048,

),

)

启动训练:

trainer_stats = trainer.train()==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 500 | Num Epochs = 1 | Total steps = 30 O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4 \ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8 "-____-" Trainable parameters = 67,174,400 of 10,737,395,235 (0.63% trained) `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. Unsloth: Will smartly offload gradients to save VRAM!

 [30/30 05:42, Epoch 0/1]

StepTraining Loss
53.354900
102.174600
151.310900
201.113800
251.023700
301.047700

训练过程中损失平稳下降,说明模型在学习图像与描述之间的映射关系。

8.微调后效果对比

再次使用第 100个样本进行测试:

FastVisionModel.for_inference(model)



image = dataset[45]["image"]



messages = [

{

"role": "user",

"content": [

{"type": "image"},

{"type": "text", "text": instruction},

],

}

]

input_text = tokenizer.apply_chat_template(

messages, add_generation_prompt=True

)

inputs = tokenizer(

image,

input_text,

add_special_tokens=False,

return_tensors="pt",

).to("cuda")



from transformers import TextStreamer



text_streamer = TextStreamer(tokenizer, skip_prompt=True)

_ = model.generate(

**inputs,

streamer=text_streamer,

max_new_tokens=128,

use_cache=True,

temperature=1.5,

min_p=0.1

)Add the iconic Alex Bowman 88 1:24 die-cast car to your collection. This 1:24 scale NASCAR 88 88 NATIONWIDE CHEVY SS is an Official Lionel Collectibles product, capturing the sleek Chevrolet Camaro design. Perfect for collectors and enthusiasts!<|eot_id|>

结果显著改善:生成文本更加精准,风格贴近真实商品描述,但仍有少量冗余,建议继续训练完整数据集,训练 3-5 轮以获得最佳效果。

Read more

VSCode AI Copilot 智能补全失效?(错误修正终极手册)

第一章:VSCode AI Copilot 智能补全失效?(错误修正终极手册) 检查网络连接与认证状态 AI Copilot 依赖稳定的网络连接以访问云端模型服务。若补全功能无响应,首先确认是否已登录 GitHub 账户并正确授权。 * 打开 VSCode 命令面板(Ctrl+Shift+P) * 输入并执行 Copilot: Sign in to GitHub * 在浏览器中完成授权后返回编辑器查看状态栏 状态栏应显示“Copilot 已启用”,否则可能因令牌过期导致服务中断。 验证扩展安装与版本兼容性 确保安装的是官方 GitHub Copilot 扩展而非第三方插件。 # 在终端中检查已安装扩展 code --list-extensions | grep -i copilot # 正确输出应包含: # GitHub.copilot # GitHub.copilot-chat (可选) 若缺失,通过扩展市场重新安装或使用命令行:

养龙虾-------【openclaw 对接Stable Diffusion 】---解锁免费图片生成神器

🚀 MiniMax Token Plan 惊喜上线!新增语音、音乐、视频和图片生成权益。邀请好友享双重好礼,助力开发体验! 好友立享 9折 专属优惠 + Builder 权益,你赢返利 + 社区特权! 👉 立即参与:https://platform.minimaxi.com/subscribe/token-plan?code=2NMAwoNLlZ&source=link 【开源神器】OpenClaw + Stable Diffusion:免费畅享AI绘画! 引言:AI绘画的门槛,你跨过了吗? 最近AI绘画的热度依旧不减,Stable Diffusion(简称SD)作为开源界的扛把子,出图效果和可控性确实没得说。但是,拦路虎也显而易见: 1. 学习门槛高:参数太多,没这个耐心的人学起来容易半途放弃。 2. 环境配置复杂:

2026-01-14 学习记录--LLM-申请Hugging Face 访问令牌(以Meta-Llama-3.1-8B-Instruct为例)

2026-01-14 学习记录--LLM-申请Hugging Face 访问令牌(以Meta-Llama-3.1-8B-Instruct为例)

LLM-申请 Hugging Face 访问令牌(以Meta-Llama-3.1-8B-Instruct为例) 一、请求访问Llama模型 ⭐️ 随便进入想要访问的Llama模型,这里展示的是Meta-Llama-3-8B-Instruct。 1、 点击链接,申请访问Llama模型 2、 填写相关申请信息,注意如下:👇🏻(1)、国家最好选「美国」,然后填 「美国的大学」;(2)、操作这一步时,节点需要是对应国家的节点(若是美国,那么节点也要是美国)。 3、 提交成功后,就可开始申请Llama模型的Hugging Face 访问令牌啦~ 二、申请Llama模型的Hugging Face 访问令牌(以Meta-Llama-3.1-8B-Instruct为例)⭐️ 1、判断是否需要申请 访问Meta-Llama-3.1-8B-Instruct模型在Hugging Face上的官方仓库。 假若你看见“You need to agree to share your

PaperXie降重复|AIGC率中的英文Turnitin降AIGC:88.3%→9.88%的学术魔术

PaperXie降重复|AIGC率中的英文Turnitin降AIGC:88.3%→9.88%的学术魔术

paperxie-免费查重复率aigc检测/开题报告/毕业论文/智能排版/文献综述/aippt https://www.paperxie.cn/weight?type=1https://www.paperxie.cn/weight?type=1https://www.paperxie.cn/weight?type=1 当你盯着 Turnitin 报告上 88.3% 的红色重复率,看着 “疑似 AI 生成” 的刺眼标注,收到导师 “全文需大幅修改” 的邮件时,会不会突然意识到:在海外学术圈,一份干净的原创性报告,已经成了比 GPA 更重要的通行证。对于每一个在异国他乡求学的中国留学生来说,论文降重不再是 “锦上添花”,而是 “学术求生”