Whisper OpenAI 开源语音识别工具安装与使用指南

Whisper OpenAI 开源语音识别工具安装与使用指南 | 极客日志

ffmpeg

PATH

choco install ffmpeg

brew install ffmpeg

# Debian/Ubuntu
sudo apt update && sudo apt install ffmpeg
# CentOS/RHEL
sudo yum install ffmpeg ffmpeg-devel

whisper --version
# 输出 Whisper 版本

pip install -U openai-whisper

whisper audio.mp3 --device mps

pip install torch torchaudio -U --pre --extra-index-url https://download.pytorch.org/whl/nightly/cpu

import torch
print(torch.cuda.is_available())
# 输出应为 True

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

模型名称	参数量	内存占用	适合场景
`tiny`	39M	~1 GB	快速转录，低精度
`base`	74M	~1.5 GB	平衡速度与精度
`small`	244M	~2.5 GB	中等精度，多语言支持
`medium`	769M	~5 GB	高精度，复杂音频
`large`	1550M	~10 GB	最高精度，专业场景

whisper [音频文件路径] --model[模型名] --language[语言代码]

# 使用中等模型转录英文音频，生成 TXT 和 SRT 文件
whisper lecture.mp3 --model medium --language en --output_format txt,srt

参数	说明
`--model`	指定模型（默认 `small`）
`--language`	指定语言（如 `zh`, `en`, `ja`），若未指定会自动检测
`--task`	`transcribe`（转录）或 `translate`（翻译为英文）
`--output_format`	输出格式：`txt`, `srt`, `vtt`, `tsv`, `json`（默认全部生成）
`--output_dir`	指定输出目录（默认当前目录）
`--fp16`	使用 FP16 精度加速推理（需 GPU 支持）
`--device`	指定计算设备：`cpu`, `cuda`, `mps`（Apple Silicon）
`--temperature`	控制生成随机性（0-1，0 表示确定性输出）
`--best_of`	束搜索候选数（影响精度与速度）
`--beam_size`	束搜索宽度（与 `best_of` 配合使用）
`--word_timestamps`	为每个单词生成时间戳（适用于 `json` 和 `srt` 格式）

whisper audio.mp3 --initial_prompt "以下是关于量子力学的讲座。"
# 提供上下文提示

whisper audio.mp4 --task translate --output_format srt
# 翻译为英文字幕

whisper long_audio.wav --model large --language en --split_duration 300
# 每 300 秒分割一次

import whisper

# 加载模型
model = whisper.load_model("medium")

# 转录音频
result = model.transcribe("audio.mp3", language="zh", fp16=False)

# 输出结果
print(result["text"])

for segment in result["segments"]:
    print(f"[{segment['start']}-{segment['end']}s] {segment['text']}")

result = model.transcribe("audio.wav", language="en", temperature=0.2, beam_size=5, word_timestamps=True, initial_prompt="This is a podcast about climate change.")

whisper audio1.mp3 audio2.wav --model small --output_dir ./outputs/

model = whisper.load_model("medium", device="cuda", in_memory=True)

whisper audio.mp3 --model /path/to/custom_model.pt

import whisper
import pyaudio
import wave

# 录制音频并保存为文件
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5

p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
frames = []

print("Recording...")
for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

stream.stop_stream()
stream.close()
p.terminate()

# 保存为 WAV 文件
with wave.open("temp.wav", 'wb') as wf:
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))

# 使用 Whisper 转录
model = whisper.load_model("base")
result = model.transcribe("temp.wav")
print(result["text"])

# 遍历目录下所有 MP3 文件
for file in *.mp3; do
    whisper "$file" --model small --output_dir ./transcripts/
done

Whisper OpenAI 开源语音识别工具安装与使用指南

1. 安装 Whisper

1.1 系统依赖

1.2 安装 Whisper

1.3 GPU 加速（可选）

2. 模型详解

2.1 模型类型

2.2 模型下载

3. 命令行使用

3.1 基础命令

3.2 核心参数

3.3 高级用法

4. Python API 使用

API 参数

5. 性能优化

5.1 加速技巧

5.2 内存不足处理

6. 常见问题解决

6.1 错误：`FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'`

6.2 错误：`ERROR: Could not find model file`

6.3 识别结果不准确

7. 扩展应用

7.1 实时语音识别

7.2 集成到其他工具

8. 注意事项

更多推荐文章

相关免费在线工具

Whisper OpenAI 开源语音识别工具安装与使用指南

1. 安装 Whisper

1.1 系统依赖

1.2 安装 Whisper

1.3 GPU 加速（可选）

2. 模型详解

2.1 模型类型

2.2 模型下载

3. 命令行使用

3.1 基础命令

3.2 核心参数

3.3 高级用法

4. Python API 使用

API 参数

5. 性能优化

5.1 加速技巧

5.2 内存不足处理

6. 常见问题解决

6.1 错误：FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

6.2 错误：ERROR: Could not find model file

6.3 识别结果不准确

7. 扩展应用

7.1 实时语音识别

7.2 集成到其他工具

8. 注意事项

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

6.1 错误：`FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'`

6.2 错误：`ERROR: Could not find model file`