Whisper 语音识别快速入门：从安装到使用

Whisper 语音识别快速入门：从安装到使用 | 极客日志

组件	推荐规格	说明
GPU	NVIDIA RTX 4090 D / A100 / H100	建议显存 ≥ 23GB。这是运行 large-v3 模型的理想环境。
内存	16GB 或以上	确保系统有足够的内存处理音频加载和模型运算。
存储	10GB 可用空间	需要空间存放模型文件（约 3GB）和系统文件。
系统	Ubuntu 24.04 LTS	或其他兼容 CUDA 12.4 的 Linux 发行版。这是最稳定、支持最好的环境。

pip install -r requirements.txt

sudo apt-get update && sudo apt-get install -y ffmpeg

python3 app.py

Running on local URL: http://0.0.0.0:7860 Running on public URL: http://<你的服务器 IP 地址>:7860

import torch model = whisper.load_model("large-v3", device="cuda", in_dtype=torch.float16)

问题现象	可能原因	解决方案
报错 `ffmpeg not found`	系统没有安装 FFmpeg	运行 `sudo apt-get install -y ffmpeg` 安装。
处理时程序崩溃，提示 `CUDA out of memory`	显卡显存不够用了	1. 尝试上面提到的 FP16 半精度模式。

更多推荐文章