Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配

Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配 | 极客日志

# 检查 NVIDIA 驱动版本
nvidia-smi
# 检查 CUDA 版本（如果已安装）
nvcc --version

conda create -n llama3v python=3.10 -y
conda activate llama3v

# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install transformers accelerate bitsandbytes pillow

from transformers import AutoProcessor, AutoModelForCausalLM
import torch

model_path = "./Llama-3.2V-11B-cot"

processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
print(f"模型已加载到设备：{model.device}")

export CUDA_VISIBLE_DEVICES=0,1

accelerate config

from accelerate import Accelerator

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map=None,
    trust_remote_code=True
)
model = accelerator.prepare(model)

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

model_path = "./Llama-3.2V-11B-cot"
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

image_path = "test_image.jpg"
image = Image.open(image_path).convert('RGB')
question = "Describe what is happening in this image and explain why."

prompt = f"A chat between a curious human and an AI assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\n{question} ASSISTANT:"
inputs = processor(images=image, text=prompt, return_tensors="pt").to(model.device)

print("正在生成回答，请稍候…")
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=300, do_sample=True, temperature=0.7)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

answer_start = generated_text.find("ASSISTANT:") + len("ASSISTANT: ")
print(generated_text[answer_start:])

Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配

Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配

1. 环境准备与快速部署

1.1 系统与驱动检查

1.2 创建并激活 Conda 环境

1.3 安装 PyTorch 与依赖

2. 基础概念与模型加载

2.1 模型是如何'看图思考'的？

2.2 单 GPU 加载模型

3. 多卡 GPU 部署与优化实践

3.1 方法一：使用 `device_map="auto"`

3.2 方法二：使用 `accelerate` 高级配置

3.3 针对不同 GPU 的优化建议

4. 快速上手与效果测试

5. 常见问题与排错指南

6. 总结

更多推荐文章

相关免费在线工具

Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配

Llama-3.2V-11B-COT 部署教程：NVIDIA A10/A100/V100 多卡 GPU 适配

1. 环境准备与快速部署

1.1 系统与驱动检查

1.2 创建并激活 Conda 环境

1.3 安装 PyTorch 与依赖

2. 基础概念与模型加载

2.1 模型是如何'看图思考'的？

2.2 单 GPU 加载模型

3. 多卡 GPU 部署与优化实践

3.1 方法一：使用 device_map="auto"

3.2 方法二：使用 accelerate 高级配置

3.3 针对不同 GPU 的优化建议

4. 快速上手与效果测试

5. 常见问题与排错指南

6. 总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3.1 方法一：使用 `device_map="auto"`

3.2 方法二：使用 `accelerate` 高级配置