Python 聚类实战：OPTICS 算法原理与可视化全流程

Python 聚类实战：OPTICS 算法原理与可视化全流程 | 极客日志

!pip install numpy matplotlib scikit-learn

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles, make_blobs
from sklearn.cluster import OPTICS

# 配置 Matplotlib 中文显示
plt.rcParams["font.sans-serif"] = ["SimHei"] # Windows 系统
# plt.rcParams["font.sans-serif"] = ["Arial Unicode MS"] # Mac 系统
plt.rcParams["axes.unicode_minus"] = False # 解决负号显示异常

# 1. 生成内外环形数据（非线性分布，模拟复杂结构）
X1, _ = make_circles(
    n_samples=1000,      # 样本数
    factor=0.2,          # 内外环半径比例
    noise=0.05,          # 噪声比例
    random_state=5       # 随机种子（保证结果可复现）
)

# 2. 生成 2 个独立球形簇（模拟局部密集结构）
X2, _ = make_blobs(
    n_samples=200,
    n_features=2,
    centers=[[1, 2]],
    cluster_std=[0.1],
    random_state=5
)
X3, _ = make_blobs(
    n_samples=300,
    n_features=2,
    centers=[[-0.5, -1.2]],
    cluster_std=[0.1],
    random_state=5
)

# 3. 合并所有数据（不区分标签，模拟无监督场景）
X = np.concatenate((X1, X2, X3))

# 4. 可视化原始数据
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], marker="*", c="gray")
plt.title("原始数据分布：环形 + 球形混合结构")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.show()

# 初始化 OPTICS 模型（核心参数配置）
model = OPTICS(
    min_samples=15,           # 核心点的最小邻域样本数
    xi=0.05,                  # 簇划分的陡峭度阈值（控制簇数量）
    min_cluster_size=0.05,    # 最小簇的样本占比（过滤小簇/噪声）
    cluster_method="xi"       # 指定簇划分方式（避免 HTML 可视化报错）
)
# 训练模型（无监督学习，无需标签）
model.fit(X)

# 获取 OPTICS 输出的样本排序与可达距离
ordering = model.ordering_              # 样本按密度从高到低的排序索引
reachability = model.reachability_[ordering]  # 排序后的可达距离

# 可视化可达距离曲线
plt.figure(figsize=(10, 4))
plt.plot(reachability, marker=".", linestyle="none", color="#1f77b4")
plt.xlabel("样本排序（密度从高到低）")
plt.ylabel("可达距离")
plt.title("OPTICS 可达距离曲线")
plt.grid(alpha=0.3)
plt.show()

# 获取排序后的簇标签
labels = model.labels_[ordering]

# 可视化聚类结果
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="tab10", s=50, alpha=0.8)
plt.title("OPTICS 聚类结果：环形 + 球形簇区分")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.colorbar(label="簇标签")
plt.show()

# 调整 min_samples 为 10（提高密度灵敏度）
model_tuned = OPTICS(
    min_samples=10,
    xi=0.05,
    min_cluster_size=0.05,
    cluster_method="xi"
)
model_tuned.fit(X)
# 获取调优后的簇标签
labels_tuned = model_tuned.labels_

# 可视化调优后的结果
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels_tuned, cmap="tab10", s=50, alpha=0.8)
plt.title("OPTICS 调优后结果（min_samples=10）")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.colorbar(label="簇标签")
plt.show()

model = OPTICS(
    min_samples=15,
    xi=0.05,
    min_cluster_size=0.05,
    cluster_method="xi"  # 关键参数：避免 HTML 可视化
)

from sklearn.cluster import KMeans, DBSCAN

# 1. K-Means 聚类
kmeans = KMeans(n_clusters=4, random_state=5)
kmeans_labels = kmeans.fit_predict(X)

# 2. DBSCAN 聚类
dbscan = DBSCAN(eps=0.2, min_samples=5)
dbscan_labels = dbscan.fit_predict(X)

# 3. 对比可视化
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# K-Means 结果
axes[0].scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap="tab10")
axes[0].set_title("K-Means 聚类结果")

# DBSCAN 结果
axes[1].scatter(X[:, 0], X[:, 1], c=dbscan_labels, cmap="tab10")
axes[1].set_title("DBSCAN 聚类结果")

# OPTICS 结果
axes[2].scatter(X[:, 0], X[:, 1], c=labels, cmap="tab10")
axes[2].set_title("OPTICS 聚类结果")

plt.tight_layout()
plt.show()

算法	优势	劣势	本次数据效果
OPTICS	识别任意形状簇、展示密度分布	参数较多、训练时间较长	优秀（区分环形）
K-Means	速度快、实现简单	仅支持球形簇、依赖 K 值	差（无法区分环形）
DBSCAN	抗噪声、无需指定簇数	对 eps 和 min_samples 敏感	较好（需精细调参）

# 环境配置
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles, make_blobs
from sklearn.cluster import OPTICS

# 中文显示配置
plt.rcParams["font.sans-serif"] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False

# 1. 生成数据
X1, _ = make_circles(n_samples=1000, factor=0.2, noise=0.05, random_state=5)
X2, _ = make_blobs(n_samples=200, n_features=2, centers=[[1,2]], cluster_std=[0.1], random_state=5)
X3, _ = make_blobs(n_samples=300, n_features=2, centers=[[-0.5,-1.2]], cluster_std=[0.1], random_state=5)
X = np.concatenate((X1, X2, X3))

# 2. 原始数据可视化
plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], marker="*", c="gray")
plt.title("原始数据分布：环形 + 球形混合结构")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.show()

# 3. OPTICS 模型训练
model = OPTICS(min_samples=15, xi=0.05, min_cluster_size=0.05, cluster_method="xi")
model.fit(X)

# 4. 可达距离曲线可视化
ordering = model.ordering_
reachability = model.reachability_[ordering]
plt.figure(figsize=(10,4))
plt.plot(reachability, marker=".", linestyle="none", color="#1f77b4")
plt.xlabel("样本排序（密度从高到低）")
plt.ylabel("可达距离")
plt.title("OPTICS 可达距离曲线")
plt.grid(alpha=0.3)
plt.show()

# 5. 聚类结果可视化
labels = model.labels_[ordering]
plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=labels, cmap="tab10", s=50, alpha=0.8)
plt.title("OPTICS 聚类结果：环形 + 球形簇区分")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.colorbar(label="簇标签")
plt.show()

# 6. 参数调优
model_tuned = OPTICS(min_samples=10, xi=0.05, min_cluster_size=0.05, cluster_method="xi")
model_tuned.fit(X)
labels_tuned = model_tuned.labels_
plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=labels_tuned, cmap="tab10", s=50, alpha=0.8)
plt.title("OPTICS 调优后结果（min_samples=10）")
plt.xlabel("特征 1")
plt.ylabel("特征 2")
plt.colorbar(label="簇标签")
plt.show()

Python 聚类实战：OPTICS 算法原理与可视化全流程

一、引言：聚类算法中的'密度层次专家'

二、环境准备与依赖库安装

2.1 核心库说明

2.2 库导入与中文配置

三、实验数据生成：模拟非线性分布场景

3.1 数据生成代码

四、OPTICS 算法核心原理与实现

4.1 OPTICS 核心概念

4.2 OPTICS 模型训练代码

4.3 可达距离曲线可视化

4.4 聚类结果可视化

五、参数调优：优化 OPTICS 聚类效果

5.1 调优后模型训练代码

5.2 调优后聚类结果

5.3 OPTICS 参数选择指南

六、实战问题解决：Jupyter 可视化报错处理

6.1 解决方法

七、OPTICS 与其他聚类算法的对比实验

7.1 对比实验代码

7.2 对比结论

八、总结与应用场景

8.1 实验总结

8.2 OPTICS 应用场景

九、完整代码附录

更多推荐文章

相关免费在线工具

Python 聚类实战：OPTICS 算法原理与可视化全流程

一、引言：聚类算法中的'密度层次专家'

二、环境准备与依赖库安装

2.1 核心库说明

2.2 库导入与中文配置

三、实验数据生成：模拟非线性分布场景

3.1 数据生成代码

四、OPTICS 算法核心原理与实现

4.1 OPTICS 核心概念

4.2 OPTICS 模型训练代码

4.3 可达距离曲线可视化

4.4 聚类结果可视化

五、参数调优：优化 OPTICS 聚类效果

5.1 调优后模型训练代码

5.2 调优后聚类结果

5.3 OPTICS 参数选择指南

六、实战问题解决：Jupyter 可视化报错处理

6.1 解决方法

七、OPTICS 与其他聚类算法的对比实验

7.1 对比实验代码

7.2 对比结论

八、总结与应用场景

8.1 实验总结

8.2 OPTICS 应用场景

九、完整代码附录

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具