谱聚类的原理
谱聚类
谱聚类算法流程:
i
n
p
u
t
:
???
X
=
{
x
1
,
x
2
,
.
.
.
,
x
n
}
??
o
u
t
p
u
t
:
???
C
=
{
c
1
,
c
2
,
.
.
.
c
k
2
}
???
(
1
)根据输入的相似矩阵生成方式构建样本的相似矩阵
S
(
2
)根据相似矩阵
S
构建邻接矩阵
W
,构建度矩阵
D
???????
(
3
)计算出拉普拉斯矩阵
L
???????????????????????????????????????????
(
4
)构建标准化后的拉普拉斯矩阵
D
?
1
2
L
D
?
1
2
???????????????
(
5
)计算最小的
D
?
1
2
L
D
?
1
2
k
1
个特征值所各自对应的特征
向量
f
(
6
)将各自对应的特征向量
f
组成的矩阵进行标准化,??????
最终组成
n
×
k
1
维的特征矩阵
F
(
7
)对
F
中的每一行作为一个
k
1
维的样本,共
n
个样本,
用输入的聚类方法进行聚类,聚类维数为
k
2
。
(
8
)得到簇划分
(
c
1
,
.
.
.
,
c
k
2
)
????????????????????????????????????????
input:\ \ \ X=\{x_1,x_2,...,x_n \}\ \ \\ output:\ \ \ C=\{c_1,c_2,...c_{k2} \}\ \ \ \\ \\ (1)根据输入的相似矩阵生成方式构建样本的相似矩阵S\\ \\ (2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D\ \ \ \ \ \ \ \\ \\ (3)计算出拉普拉斯矩阵L\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (4)构建标准化后的拉普拉斯矩阵D^{-\frac12}LD^{-\frac12}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (5)计算最小的D^{-\frac12}LD^{-\frac12}k_1个特征值所各自对应的特征\\向量f \\ \\ (6)将各自对应的特征向量f组成的矩阵进行标准化,\ \ \ \ \ \ \\最终组成n×k_1维的特征矩阵F\\ \\ (7)对F中的每一行作为一个k1维的样本,共n个样本,\\用输入的聚类方法进行聚类,聚类维数为k2。\\ \\ (8)得到簇划分(c_1,...,c_{k2})\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
input:???X={x1?,x2?,...,xn?}??output:???C={c1?,c2?,...ck2?}???(1)根据输入的相似矩阵生成方式构建样本的相似矩阵S(2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D???????(3)计算出拉普拉斯矩阵L???????????????????????????????????????????(4)构建标准化后的拉普拉斯矩阵D?21?LD?21????????????????(5)计算最小的D?21?LD?21?k1?个特征值所各自对应的特征向量f(6)将各自对应的特征向量f组成的矩阵进行标准化,??????最终组成n×k1?维的特征矩阵F(7)对F中的每一行作为一个k1维的样本,共n个样本,用输入的聚类方法进行聚类,聚类维数为k2。(8)得到簇划分(c1?,...,ck2?)????????????????????????????????????????
一般情况下,
k
1
k_1
k1?是等于
k
2
k_2
k2?的
谱聚类思想
? 谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。
Graph-based(带权重的无向图)
样本数据
:
?
X
=
(
x
1
,
.
.
.
,
x
N
)
?
无向图
:
?
G
=
{
V
,
E
}
顶点集
:
?
V
=
{
1
,
2
,
.
.
.
,
N
}
?
X
边集
:
?
E
:
s
i
m
i
l
a
r
i
t
y
??
m
a
t
r
i
x
(
a
f
f
i
m
t
y
??
m
a
t
r
i
x
)
W
=
[
w
11
w
12
.
.
.
w
1
N
w
21
w
22
.
.
.
w
2
N
.
.
.
.
.
.
.
.
.
.
.
.
w
N
1
w
N
2
.
.
.
w
N
N
]
=
[
w
i
j
]
,
1
≤
i
,
j
≤
N
其中
w
i
j
=
{
K
(
x
i
,
x
j
)
=
e
x
p
{
?
∣
∣
x
i
?
x
j
∣
∣
2
2
2
θ
2
}
if?
(
i
,
j
)
∈
E
0
if?
(
i
,
j
)
?
E
顶点
i
的度
:
?
d
i
=
∑
j
=
1
N
w
i
j
度矩阵
:
D
=
d
i
a
g
(
W
1
N
)
=
[
d
1
0
.
.
.
0
0
d
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
d
N
]
=
[
∑
j
=
1
N
w
1
j
0
.
.
.
0
0
∑
j
=
1
N
w
2
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
j
=
1
N
w
N
j
]
L
a
p
l
a
c
i
a
n
?
M
a
t
r
i
x
:
L
=
D
?
W
样本数据: \ X=(x_1,...,x_N)^\top\\ \\ 无向图:\ G=\{V,E\}\\ \\ 顶点集:\ V=\{1,2,...,N\}?X\\ \\ 边集:\ E:similarity\ \ matrix(affimty\ \ matrix)\\ \\ W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)?E \\ \end{cases}\\ \\ 顶点i的度:\ d_i=\sum_{j=1}^Nw_{ij}\\ \\ 度矩阵:D=diag(W\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ Laplacian \ Matrix:L=D-W
样本数据:?X=(x1?,...,xN?)?无向图:?G={V,E}顶点集:?V={1,2,...,N}?X边集:?E:similarity??matrix(affimty??matrix)W=
?w11?w21?...wN1??w12?w22?...wN2??............?w1N?w2N?...wNN??
?=[wij?],1≤i,j≤N其中wij?=?
?
??K(xi?,xj?)=exp{?2θ2∣∣xi??xj?∣∣22??}0?if?(i,j)∈Eif?(i,j)∈/E?顶点i的度:?di?=j=1∑N?wij?度矩阵:D=diag(W1N?)=
?d1?0...0?0d2?...0?............?00...dN??
?=
?∑j=1N?w1j?0...0?0∑j=1N?w2j?...0?............?00...∑j=1N?wNj??
?Laplacian?Matrix:L=D?W
定义:
A
?
V
,
B
?
V
,
A
∩
B
=
?
→
W
(
A
,
B
)
=
∑
i
∈
A
,
j
∈
B
w
i
j
A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij}
A?V,B?V,A∩B=?→W(A,B)=i∈A,j∈B∑?wij?
假如一共K个类别:
C
u
t
(
V
)
=
C
u
t
(
A
1
,
.
.
.
,
A
K
)
=
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
=
∑
k
=
1
K
W
(
A
k
,
V
)
?
∑
k
=
1
K
W
(
A
k
,
A
k
)
Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k)
Cut(V)=Cut(A1?,...,AK?)=k=1∑K?W(Ak?,Ak?)=k=1∑K?W(Ak?,V)?k=1∑K?W(Ak?,Ak?)
目标:
m
i
n
?
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
\underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)
{Ak?}k=1K?min??Ncut(V)
Ncut的定义:
c
u
t
(
V
)
=
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
→
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
Δ
Δ
=
d
e
g
r
e
e
(
A
k
)
=
∑
i
∈
A
k
d
i
?????
d
i
=
∑
j
=
1
N
w
i
j
→
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
∑
i
∈
A
k
d
i
?????
d
i
=
∑
j
=
1
N
w
i
j
=
∑
k
=
1
K
W
(
A
k
,
V
)
?
W
(
A
k
,
A
k
)
∑
i
∈
A
k
d
i
=
∑
k
=
1
K
W
(
A
k
,
V
)
?
W
(
A
k
,
A
k
)
∑
i
∈
A
k
∑
j
=
1
N
w
i
j
cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}}
cut(V)=k=1∑K?W(Ak?,Ak?)→Ncut=k=1∑K?ΔW(Ak?,Ak?)?Δ=degree(Ak?)=i∈Ak?∑?di??????di?=j=1∑N?wij?→Ncut=k=1∑K?∑i∈Ak??di?W(Ak?,Ak?)??????di?=j=1∑N?wij?=k=1∑K?∑i∈Ak??di?W(Ak?,V)?W(Ak?,Ak?)?=k=1∑K?∑i∈Ak??∑j=1N?wij?W(Ak?,V)?W(Ak?,Ak?)?
Model
m
i
n
?
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
=
m
i
n
?
{
A
k
}
k
=
1
K
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
∑
i
∈
A
k
d
i
→
{
A
k
}
k
=
1
K
=
a
r
g
m
i
n
?
{
A
k
}
k
=
1
K
N
c
u
t
(
V
)
\underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)
{Ak?}k=1K?min??Ncut(V)={Ak?}k=1K?min??k=1∑K?∑i∈Ak??di?W(Ak?,Ak?)?→{Ak?}k=1K?=arg{Ak?}k=1K?min??Ncut(V)
indicator vector:
{
y
i
∈
{
0
,
1
}
K
∑
j
=
1
K
y
i
j
=
1
?????????
y
i
=
[
y
i
1
y
i
2
.
.
.
y
i
K
]
????????
y
i
j
=
1
?
第?
i
个样本属于第?
j
个类别
\begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1?第\ i个样本属于第\ j个类别
?
?
??yi?∈{0,1}K∑j=1K?yij?=1??????????yi?=
?yi1?yi2?...yiK??
?????????yij?=1?第?i个样本属于第?j个类别
Y = [ y 1 , . . . y K ] N × K ? 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1?,...yK?]N×K??将问题模型转换:Y^=argY^min?Ncut(V)
? 将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^min?Ncut(V)
将Ncut转换成矩阵的形式:
N
c
u
t
=
∑
k
=
1
K
W
(
A
k
,
A
 ̄
k
)
∑
i
∈
A
k
d
i
=
T
r
[
W
(
A
1
,
A
 ̄
1
)
∑
i
∈
A
1
d
i
0
.
.
.
0
0
W
(
A
2
,
A
 ̄
2
)
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
 ̄
K
)
∑
i
∈
A
K
d
i
]
=
T
r
[
W
(
A
1
,
A
 ̄
1
)
0
.
.
.
0
0
W
(
A
2
,
A
 ̄
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
 ̄
K
)
]
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
?
1
Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1}
Ncut=k=1∑K?∑i∈Ak??di?W(Ak?,Ak?)?=Tr
?∑i∈A1??di?W(A1?,A1?)?0...0?0∑i∈A2??di?W(A2?,A2?)?...0?............?00...∑i∈AK??di?W(AK?,AK?)??
?=Tr
?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)?
?
?∑i∈A1??di?0...0?0∑i∈A2??di?...0?............?00...∑i∈AK??di??
??1
记 O K × K = [ W ( A 1 , A  ̄ 1 ) 0 . . . 0 0 W ( A 2 , A  ̄ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A  ̄ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} 记OK×K?= ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ?PK×K?= ?∑i∈A1??di?0...0?0∑i∈A2??di?...0?............?00...∑i∈AK??di?? ?
现在问题转化为: N c u t ( V ) = T r ( O P ? 1 ) Ncut(V)=Tr(OP^{-1}) Ncut(V)=Tr(OP?1)
已知W、Y,求O、P:
先求解P:
Y
?
Y
=
[
y
1
,
.
.
.
y
N
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
=
∑
i
=
1
N
y
i
y
i
T
=
[
N
1
0
.
.
.
0
0
N
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
N
K
]
K
×
K
=
[
∑
i
∈
A
1
1
0
.
.
.
0
0
∑
i
∈
A
2
1
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
1
]
K
×
K
Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix} =\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K}=\\ \\ \\ \begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K}
Y?Y=[y1?,...yN?]
?y1T?y2T?...yNT??
?=i=1∑N?yi?yiT?=
?N1?0...0?0N2?...0?............?00...NK??
?K×K?=
?∑i∈A1??10...0?0∑i∈A2??1...0?............?00...∑i∈AK??1?
?K×K?
N
k
N_k
Nk? 的含义:在N个样本中,属于类别k的样本个数。
∑
k
=
1
N
N
k
=
N
,
N
k
=
∣
A
k
∣
=
∑
i
∈
A
k
1
\sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1
∑k=1N?Nk?=N,Nk?=∣Ak?∣=∑i∈Ak??1
∑
i
=
1
N
y
i
d
i
y
i
T
=
y
1
d
1
y
1
T
+
y
2
d
2
y
2
T
.
.
.
+
y
N
d
N
y
N
T
=
Y
T
D
Y
P
K
×
K
=
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
=
Y
T
D
Y
其中
,
D
=
[
d
1
0
.
.
.
0
0
d
2
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
d
N
]
=
d
i
a
g
(
W
1
N
)
=
[
∑
j
=
1
N
w
1
j
0
.
.
.
0
0
∑
j
=
1
N
w
2
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
j
=
1
N
w
N
j
]
\sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\
i=1∑N?yi?di?yiT?=y1?d1?y1T?+y2?d2?y2T?...+yN?dN?yNT?=YTDYPK×K?=
?∑i∈A1??di?0...0?0∑i∈A2??di?...0?............?00...∑i∈AK??di??
?=YTDY其中,D=
?d1?0...0?0d2?...0?............?00...dN??
?=diag(W1N?)=
?∑j=1N?w1j?0...0?0∑j=1N?w2j?...0?............?00...∑j=1N?wNj??
?
所以我们求解的P为:
P
=
Y
T
D
Y
其中
,
D
=
d
i
a
g
(
W
1
N
)
P=Y^TDY\\ \\ \\ 其中,D=diag(W\mathbf{1}_N)
P=YTDY其中,D=diag(W1N?)
再求解O:
O
K
×
K
=
[
W
(
A
1
,
A
 ̄
1
)
0
.
.
.
0
0
W
(
A
2
,
A
 ̄
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
 ̄
K
)
]
W
(
A
k
,
A
k
 ̄
)
=
W
(
A
k
,
V
)
?
∑
i
∈
A
k
d
i
?
W
(
A
k
,
A
k
)
?
∑
i
∈
A
k
∑
j
∈
A
k
w
i
j
→
O
=
[
∑
i
∈
A
1
d
i
0
.
.
.
0
0
∑
i
∈
A
2
d
i
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
∑
i
∈
A
K
d
i
]
?
[
W
(
A
1
,
A
1
)
0
.
.
.
0
0
W
(
A
2
,
A
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
K
)
]
O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\
OK×K?=
?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)?
?W(Ak?,Ak??)=∑i∈Ak??di?
W(Ak?,V)???∑i∈Ak??∑j∈Ak??wij?
W(Ak?,Ak?)??→O=
?∑i∈A1??di?0...0?0∑i∈A2??di?...0?............?00...∑i∈AK??di??
??
?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)?
?
前面的矩阵我们知道:
Y
T
D
Y
Y^TDY
YTDY, 再来看后面部分:
[
W
(
A
1
,
A
1
)
0
.
.
.
0
0
W
(
A
2
,
A
2
)
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
0
.
.
.
W
(
A
K
,
A
K
)
]
\begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}
?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)?
?
= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = ?∑i∈A1??∑j∈A1??wij?0...0?0∑i∈A2??∑j∈A2??wij?......?............?00...∑i∈AK??∑j∈AK??wij?? ?
猜想后半部分是否等于
Y
T
W
Y
Y^TWY
YTWY 验证一下:
Y
T
W
Y
维度是
K
×
K
维
Y
T
W
Y
=
[
y
1
,
.
.
.
y
N
]
[
w
11
w
12
.
.
.
w
1
N
w
21
w
22
.
.
.
w
2
N
.
.
.
.
.
.
.
.
.
.
.
.
w
N
1
w
N
2
.
.
.
w
N
N
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
=
[
∑
i
=
1
N
y
i
w
i
1
,
.
.
.
,
∑
i
=
1
N
y
i
w
N
i
]
[
y
1
T
y
2
T
.
.
.
y
N
T
]
=
∑
i
=
1
N
∑
j
=
1
N
y
i
w
i
j
y
i
T
=
∑
i
=
1
N
∑
j
=
1
N
y
i
y
i
T
w
i
j
=
[
∑
i
∈
A
1
∑
j
∈
A
1
w
i
j
∑
i
∈
A
1
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
1
∑
j
∈
A
K
w
i
j
∑
i
∈
A
2
∑
j
∈
A
1
w
i
j
∑
i
∈
A
2
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
2
∑
j
∈
A
K
w
i
j
.
.
.
.
.
.
.
.
.
.
.
.
∑
i
∈
A
K
∑
j
∈
A
1
w
i
j
∑
i
∈
A
K
∑
j
∈
A
2
w
i
j
.
.
.
∑
i
∈
A
K
∑
j
∈
A
K
w
i
j
]
Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix}
YTWY维度是K×K维YTWY=[y1?,...yN?]
?w11?w21?...wN1??w12?w22?...wN2??............?w1N?w2N?...wNN??
?
?y1T?y2T?...yNT??
?=[i=1∑N?yi?wi1?,...,i=1∑N?yi?wNi?]
?y1T?y2T?...yNT??
?=i=1∑N?j=1∑N?yi?wij?yiT?=i=1∑N?j=1∑N?yi?yiT?wij?=
?∑i∈A1??∑j∈A1??wij?∑i∈A2??∑j∈A1??wij?...∑i∈AK??∑j∈A1??wij??∑i∈A1??∑j∈A2??wij?∑i∈A2??∑j∈A2??wij?...∑i∈AK??∑j∈A2??wij??............?∑i∈A1??∑j∈AK??wij?∑i∈A2??∑j∈AK??wij?...∑i∈AK??∑j∈AK??wij??
?
观察上式和O的后半部分:
[
∑
i
∈
A
1
∑
j
∈
A
1
w
i
j
0
.
.
.
0
0
∑
i
∈
A
2
∑
j
∈
A
2
w
i
j
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
∑
i
∈
A
K
∑
j
∈
A
K
w
i
j
]
\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\
?∑i∈A1??∑j∈A1??wij?0...0?0∑i∈A2??∑j∈A2??wij?......?............?00...∑i∈AK??∑j∈AK??wij??
?
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} ?∑i∈A1??∑j∈A1??wij?∑i∈A2??∑j∈A1??wij?...∑i∈AK??∑j∈A1??wij??∑i∈A1??∑j∈A2??wij?∑i∈A2??∑j∈A2??wij?...∑i∈AK??∑j∈A2??wij??............?∑i∈A1??∑j∈AK??wij?∑i∈A2??∑j∈AK??wij?...∑i∈AK??∑j∈AK??wij?? ?
我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果
记 O ′ = Y T D Y ? Y T W Y O'=Y^TDY - Y^TWY O′=YTDY?YTWY 那么 O ′ P O'P O′P相当于对 O ′ O' O′的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(O′P)。
至此我们解出了
O
O
O ,并且提出了用
O
′
O'
O′ 代替
O
O
O 可以达到同样的目的。
O
′
=
Y
T
D
Y
?
Y
T
W
Y
O'=Y^TDY-Y^TWY
O′=YTDY?YTWY
我们最终的优化问题变为:
Y
^
=
a
r
g
m
i
n
Y
^
?
T
r
(
Y
T
(
D
?
W
)
Y
(
Y
T
D
Y
)
?
1
)
=
Y
^
=
a
r
g
m
i
n
Y
^
?
T
r
(
Y
T
L
Y
(
Y
T
D
Y
)
?
1
)
这里
L
=
D
?
W
是拉普拉斯矩阵
\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵
Y^=argY^min??Tr(YT(D?W)Y(YTDY)?1)=Y^=argY^min??Tr(YTLY(YTDY)?1)这里L=D?W是拉普拉斯矩阵
To minimaze T r ( Y T L Y ( Y T D Y ) ? 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)?1)
T r ( Y T L Y ( Y T D Y ) ? 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)?1)
其中
Y
∈
R
N
×
K
Y∈R^{N×K}
Y∈RN×K ,每一行是ONE-HOT,表示第i行属于哪一类。
Y
Y
Y 形如:
[
0
.
.
.
0
1
0
.
.
.
0
0
.
.
.
1
0
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
0
0
0
.
.
.
1
1
.
.
.
0
1
0
.
.
.
0
]
\begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix}
?00...01?...............?01...00?10...01?00...00?...............?00...10?
?
记:
P
=
Y
T
D
Y
=
d
i
a
g
(
∑
i
∈
A
1
d
i
,
∑
i
∈
A
2
d
i
,
.
.
.
,
∑
i
∈
A
K
d
i
)
=
d
i
a
g
(
p
1
,
p
2
,
.
.
.
,
p
k
)
原式
=
T
r
(
Y
T
L
Y
P
?
1
)
=
T
r
(
Y
T
L
Y
P
?
1
2
P
?
1
2
)
=
T
r
(
P
?
1
2
Y
T
L
Y
P
?
1
2
)
P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12})
P=YTDY=diag(i∈A1?∑?di?,i∈A2?∑?di?,...,i∈AK?∑?di?)=diag(p1?,p2?,...,pk?)原式=Tr(YTLYP?1)=Tr(YTLYP?21?P?21?)=Tr(P?21?YTLYP?21?)
记:
H
=
Y
P
?
1
2
,
H
T
=
P
?
1
2
Y
T
H
T
H
=
P
?
1
2
Y
T
Y
P
?
1
2
=
P
?
1
2
I
P
?
1
2
=
P
?
1
H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1}
H=YP?21?,HT=P?21?YTHTH=P?21?YTYP?21?=P?21?IP?21?=P?1
原式
=
T
r
(
H
T
L
H
)
原式=Tr(H^TLH)
原式=Tr(HTLH)
定理1:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1?≤λ2?≤...≤λn?
? 特征基(eigbasis): { v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1?,v2?,...,vn?} →Orthonormal,标准正交化之后的特征向量
当 x ∈ R N , a n d ?? x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} x∈RN,and??xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v  ̄ 1 \mathbf{x}=\overline v_1 x=v1? 时取到。
proof:
x
可以用
e
i
g
b
a
s
i
s
表示
,
?因为
e
i
g
b
a
s
i
s
是
o
r
t
h
o
n
o
r
m
a
l
x
=
c
1
v
 ̄
1
+
c
2
v
 ̄
2
+
.
.
.
+
c
n
v
 ̄
n
L
x
=
λ
x
=
c
1
λ
1
v
 ̄
1
+
c
2
λ
2
v
 ̄
2
+
.
.
.
+
c
n
λ
n
v
 ̄
n
→
x
T
L
x
=
c
1
2
λ
1
+
c
2
2
λ
2
+
.
.
.
+
c
n
2
λ
n
因为
x
T
x
=
1
→
c
1
2
+
c
2
2
+
.
.
.
+
c
n
2
→
x
T
L
x
=
c
1
2
λ
1
+
c
2
2
λ
2
+
.
.
.
+
c
n
2
λ
n
≥
λ
1
当
c
1
2
=
1
,
c
i
=
0
,
i
≠
1
时等号成立
?
x
=
v
 ̄
1
??
o
r
??
x
=
?
v
 ̄
1
\mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立?\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1
x可以用eigbasis表示,?因为eigbasis是orthonormalx=c1?v1?+c2?v2?+...+cn?vn?Lx=λx=c1?λ1?v1?+c2?λ2?v2?+...+cn?λn?vn?→xTLx=c12?λ1?+c22?λ2?+...+cn2?λn?因为xTx=1→c12?+c22?+...+cn2?→xTLx=c12?λ1?+c22?λ2?+...+cn2?λn?≥λ1?当c12?=1,ci?=0,i=1时等号成立?x=v1???or??x=?v1?
定理2:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1?≤λ2?≤...≤λn?
? 特征基(eigbasis): { v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1?,v2?,...,vn?} →Orthonormal,标准正交化之后的特征向量
当 F ∈ R N × K , ? a n d ? F T F = I F∈R^{N×K},\ and\ F^TF=I F∈RN×K,?and?FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1?,v2?,...,vK?] 时取到
proof:
D
e
n
o
t
e
???
F
=
[
f
1
,
f
2
,
.
.
.
,
f
K
]
T
r
(
F
T
L
F
)
=
∑
i
=
1
K
f
i
T
L
f
i
由于定理
2
???
f
1
=
v
 ̄
1
,
f
2
=
v
 ̄
2
,
.
.
.
,
f
n
=
v
 ̄
n
时
,
T
r
(
F
T
L
F
)
最小
Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小
Denote???F=[f1?,f2?,...,fK?]Tr(FTLF)=i=1∑K?fiT?Lfi?由于定理2???f1?=v1?,f2?=v2?,...,fn?=vn?时,Tr(FTLF)最小
因为
F
T
F
=
I
F^TF=I
FTF=I,所以F是orthonormal matrix,故不能每列都是
v
 ̄
1
\overline v_1
v1?
原始优化问题 T r ( H T L H ) Tr(H^TLH) Tr(HTLH) 并没有 H T H = I H^TH=I HTH=I的性质,无法用定理2,于是对H做一些变换。
H
T
D
H
=
P
?
1
2
Y
T
D
Y
P
?
1
2
=
P
?
1
2
P
P
?
1
2
=
I
记
F
=
D
1
2
H
→
F
T
F
=
(
D
1
2
H
)
T
D
1
2
H
=
H
T
D
1
2
D
1
2
H
=
H
T
D
H
=
I
则
H
=
D
?
1
2
F
→
T
r
(
H
T
L
H
)
=
T
r
(
F
T
D
?
1
2
L
D
?
1
2
F
)
,
????
F
T
F
=
I
H^TDH=P^{-\frac12}Y^TDYP^{-\frac12}=P^{-\frac12}PP^{-\frac12}=I\\ 记F=D^{\frac12}H→F^TF=(D^{\frac12}H)^TD^{\frac12}H=H^TD^{\frac12}D^{\frac12}H=H^TDH=I\\ \\ 则H=D^{-\frac12}F\\ \\ →Tr(H^TLH)=Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I
HTDH=P?21?YTDYP?21?=P?21?PP?21?=I记F=D21?H→FTF=(D21?H)TD21?H=HTD21?D21?H=HTDH=I则H=D?21?F→Tr(HTLH)=Tr(FTD?21?LD?21?F),????FTF=I
至此我们得到最终的优化目标:
T
r
(
F
T
D
?
1
2
L
D
?
1
2
F
)
,
????
F
T
F
=
I
Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I
Tr(FTD?21?LD?21?F),????FTF=I
在解出的F上再做一次k-means,最终求得Y
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!