谱聚类的原理

2023-12-13 05:51:51

谱聚类

谱聚类算法流程:
i n p u t : ??? X = { x 1 , x 2 , . . . , x n } ?? o u t p u t : ??? C = { c 1 , c 2 , . . . c k 2 } ??? ( 1 )根据输入的相似矩阵生成方式构建样本的相似矩阵 S ( 2 )根据相似矩阵 S 构建邻接矩阵 W ,构建度矩阵 D ??????? ( 3 )计算出拉普拉斯矩阵 L ??????????????????????????????????????????? ( 4 )构建标准化后的拉普拉斯矩阵 D ? 1 2 L D ? 1 2 ??????????????? ( 5 )计算最小的 D ? 1 2 L D ? 1 2 k 1 个特征值所各自对应的特征 向量 f ( 6 )将各自对应的特征向量 f 组成的矩阵进行标准化,?????? 最终组成 n × k 1 维的特征矩阵 F ( 7 )对 F 中的每一行作为一个 k 1 维的样本,共 n 个样本, 用输入的聚类方法进行聚类,聚类维数为 k 2 。 ( 8 )得到簇划分 ( c 1 , . . . , c k 2 ) ???????????????????????????????????????? input:\ \ \ X=\{x_1,x_2,...,x_n \}\ \ \\ output:\ \ \ C=\{c_1,c_2,...c_{k2} \}\ \ \ \\ \\ (1)根据输入的相似矩阵生成方式构建样本的相似矩阵S\\ \\ (2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D\ \ \ \ \ \ \ \\ \\ (3)计算出拉普拉斯矩阵L\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (4)构建标准化后的拉普拉斯矩阵D^{-\frac12}LD^{-\frac12}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ (5)计算最小的D^{-\frac12}LD^{-\frac12}k_1个特征值所各自对应的特征\\向量f \\ \\ (6)将各自对应的特征向量f组成的矩阵进行标准化,\ \ \ \ \ \ \\最终组成n×k_1维的特征矩阵F\\ \\ (7)对F中的每一行作为一个k1维的样本,共n个样本,\\用输入的聚类方法进行聚类,聚类维数为k2。\\ \\ (8)得到簇划分(c_1,...,c_{k2})\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ input:???X={x1?,x2?,...,xn?}??output:???C={c1?,c2?,...ck2?}???1)根据输入的相似矩阵生成方式构建样本的相似矩阵S2)根据相似矩阵S构建邻接矩阵W,构建度矩阵D???????3)计算出拉普拉斯矩阵L???????????????????????????????????????????4)构建标准化后的拉普拉斯矩阵D?21?LD?21????????????????5)计算最小的D?21?LD?21?k1?个特征值所各自对应的特征向量f6)将各自对应的特征向量f组成的矩阵进行标准化,??????最终组成n×k1?维的特征矩阵F7)对F中的每一行作为一个k1维的样本,共n个样本,用输入的聚类方法进行聚类,聚类维数为k28)得到簇划分(c1?,...,ck2?)????????????????????????????????????????
一般情况下, k 1 k_1 k1?是等于 k 2 k_2 k2?

谱聚类思想

? 谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。

Graph-based(带权重的无向图)
样本数据 : ? X = ( x 1 , . . . , x N ) ? 无向图 : ? G = { V , E } 顶点集 : ? V = { 1 , 2 , . . . , N } ? X 边集 : ? E : s i m i l a r i t y ?? m a t r i x ( a f f i m t y ?? m a t r i x ) W = [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] = [ w i j ] , 1 ≤ i , j ≤ N 其中 w i j = { K ( x i , x j ) = e x p { ? ∣ ∣ x i ? x j ∣ ∣ 2 2 2 θ 2 } if? ( i , j ) ∈ E 0 if? ( i , j ) ? E 顶点 i 的度 : ? d i = ∑ j = 1 N w i j 度矩阵 : D = d i a g ( W 1 N ) = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] L a p l a c i a n ? M a t r i x : L = D ? W 样本数据: \ X=(x_1,...,x_N)^\top\\ \\ 无向图:\ G=\{V,E\}\\ \\ 顶点集:\ V=\{1,2,...,N\}?X\\ \\ 边集:\ E:similarity\ \ matrix(affimty\ \ matrix)\\ \\ W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)?E \\ \end{cases}\\ \\ 顶点i的度:\ d_i=\sum_{j=1}^Nw_{ij}\\ \\ 度矩阵:D=diag(W\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ Laplacian \ Matrix:L=D-W 样本数据:?X=(x1?,...,xN?)?无向图:?G={V,E}顶点集:?V={1,2,...,N}?X边集:?E:similarity??matrix(affimty??matrix)W= ?w11?w21?...wN1??w12?w22?...wN2??............?w1N?w2N?...wNN?? ?=[wij?],1i,jN其中wij?=? ? ??K(xi?,xj?)=exp{?2θ2∣∣xi??xj?22??}0?if?(i,j)Eif?(i,j)/E?顶点i的度:?di?=j=1N?wij?度矩阵:D=diag(W1N?)= ?d1?0...0?0d2?...0?............?00...dN?? ?= ?j=1N?w1j?0...0?0j=1N?w2j?...0?............?00...j=1N?wNj?? ?Laplacian?Matrix:L=D?W

定义:
A ? V , B ? V , A ∩ B = ? → W ( A , B ) = ∑ i ∈ A , j ∈ B w i j A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij} A?V,B?V,AB=?W(A,B)=iA,jB?wij?
假如一共K个类别:
C u t ( V ) = C u t ( A 1 , . . . , A K ) = ∑ k = 1 K W ( A k , A  ̄ k ) = ∑ k = 1 K W ( A k , V ) ? ∑ k = 1 K W ( A k , A k ) Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k) Cut(V)=Cut(A1?,...,AK?)=k=1K?W(Ak?,Ak?)=k=1K?W(Ak?,V)?k=1K?W(Ak?,Ak?)
目标:
m i n ? { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) {Ak?}k=1K?min??Ncut(V)
Ncut的定义:
c u t ( V ) = ∑ k = 1 K W ( A k , A  ̄ k ) → N c u t = ∑ k = 1 K W ( A k , A  ̄ k ) Δ Δ = d e g r e e ( A k ) = ∑ i ∈ A k d i ????? d i = ∑ j = 1 N w i j → N c u t = ∑ k = 1 K W ( A k , A  ̄ k ) ∑ i ∈ A k d i ????? d i = ∑ j = 1 N w i j = ∑ k = 1 K W ( A k , V ) ? W ( A k , A k ) ∑ i ∈ A k d i = ∑ k = 1 K W ( A k , V ) ? W ( A k , A k ) ∑ i ∈ A k ∑ j = 1 N w i j cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}} cut(V)=k=1K?W(Ak?,Ak?)Ncut=k=1K?ΔW(Ak?,Ak?)?Δ=degree(Ak?)=iAk??di??????di?=j=1N?wij?Ncut=k=1K?iAk??di?W(Ak?,Ak?)??????di?=j=1N?wij?=k=1K?iAk??di?W(Ak?,V)?W(Ak?,Ak?)?=k=1K?iAk??j=1N?wij?W(Ak?,V)?W(Ak?,Ak?)?
Model
m i n ? { A k } k = 1 K N c u t ( V ) = m i n ? { A k } k = 1 K ∑ k = 1 K W ( A k , A  ̄ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n ? { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) {Ak?}k=1K?min??Ncut(V)={Ak?}k=1K?min??k=1K?iAk??di?W(Ak?,Ak?)?{Ak?}k=1K?=arg{Ak?}k=1K?min??Ncut(V)
indicator vector:
{ y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1 ????????? y i = [ y i 1 y i 2 . . . y i K ] ???????? y i j = 1 ? 第? i 个样本属于第? j 个类别 \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1?第\ i个样本属于第\ j个类别 ? ? ??yi?{0,1}Kj=1K?yij?=1??????????yi?= ?yi1?yi2?...yiK?? ?????????yij?=1??i个样本属于第?j个类别

Y = [ y 1 , . . . y K ] N × K ? 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1?,...yK?]N×K??将问题模型转换:Y^=argY^min?Ncut(V)

? 将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^min?Ncut(V)

将Ncut转换成矩阵的形式:
N c u t = ∑ k = 1 K W ( A k , A  ̄ k ) ∑ i ∈ A k d i = T r [ W ( A 1 , A  ̄ 1 ) ∑ i ∈ A 1 d i 0 . . . 0 0 W ( A 2 , A  ̄ 2 ) ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A  ̄ K ) ∑ i ∈ A K d i ] = T r [ W ( A 1 , A  ̄ 1 ) 0 . . . 0 0 W ( A 2 , A  ̄ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A  ̄ K ) ] [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] ? 1 Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1} Ncut=k=1K?iAk??di?W(Ak?,Ak?)?=Tr ?iA1??di?W(A1?,A1?)?0...0?0iA2??di?W(A2?,A2?)?...0?............?00...iAK??di?W(AK?,AK?)?? ?=Tr ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ? ?iA1??di?0...0?0iA2??di?...0?............?00...iAK??di?? ??1

记 O K × K = [ W ( A 1 , A  ̄ 1 ) 0 . . . 0 0 W ( A 2 , A  ̄ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A  ̄ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} OK×K?= ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ?PK×K?= ?iA1??di?0...0?0iA2??di?...0?............?00...iAK??di?? ?

现在问题转化为: N c u t ( V ) = T r ( O P ? 1 ) Ncut(V)=Tr(OP^{-1}) Ncut(V)=Tr(OP?1)

已知W、Y,求O、P:

先求解P:
Y ? Y = [ y 1 , . . . y N ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N y i y i T = [ N 1 0 . . . 0 0 N 2 . . . 0 . . . . . . . . . . . . 0 0 . . . N K ] K × K = [ ∑ i ∈ A 1 1 0 . . . 0 0 ∑ i ∈ A 2 1 . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K 1 ] K × K Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix} =\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K}=\\ \\ \\ \begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K} Y?Y=[y1?,...yN?] ?y1T?y2T?...yNT?? ?=i=1N?yi?yiT?= ?N1?0...0?0N2?...0?............?00...NK?? ?K×K?= ?iA1??10...0?0iA2??1...0?............?00...iAK??1? ?K×K?
N k N_k Nk? 的含义:在N个样本中,属于类别k的样本个数。 ∑ k = 1 N N k = N , N k = ∣ A k ∣ = ∑ i ∈ A k 1 \sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1 k=1N?Nk?=N,Nk?=Ak?=iAk??1
∑ i = 1 N y i d i y i T = y 1 d 1 y 1 T + y 2 d 2 y 2 T . . . + y N d N y N T = Y T D Y P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] = Y T D Y 其中 , D = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = d i a g ( W 1 N ) = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] \sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\ i=1N?yi?di?yiT?=y1?d1?y1T?+y2?d2?y2T?...+yN?dN?yNT?=YTDYPK×K?= ?iA1??di?0...0?0iA2??di?...0?............?00...iAK??di?? ?=YTDY其中,D= ?d1?0...0?0d2?...0?............?00...dN?? ?=diag(W1N?)= ?j=1N?w1j?0...0?0j=1N?w2j?...0?............?00...j=1N?wNj?? ?
所以我们求解的P为:
P = Y T D Y 其中 , D = d i a g ( W 1 N ) P=Y^TDY\\ \\ \\ 其中,D=diag(W\mathbf{1}_N) P=YTDY其中,D=diag(W1N?)
再求解O:
O K × K = [ W ( A 1 , A  ̄ 1 ) 0 . . . 0 0 W ( A 2 , A  ̄ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A  ̄ K ) ] W ( A k , A k  ̄ ) = W ( A k , V ) ? ∑ i ∈ A k d i ? W ( A k , A k ) ? ∑ i ∈ A k ∑ j ∈ A k w i j → O = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] ? [ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\ OK×K?= ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ?W(Ak?,Ak??)=iAk??di? W(Ak?,V)???iAk??jAk??wij? W(Ak?,Ak?)??O= ?iA1??di?0...0?0iA2??di?...0?............?00...iAK??di?? ?? ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ?
前面的矩阵我们知道: Y T D Y Y^TDY YTDY, 再来看后面部分:
[ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix} ?W(A1?,A1?)0...0?0W(A2?,A2?)...0?............?00...W(AK?,AK?)? ?

= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = ?iA1??jA1??wij?0...0?0iA2??jA2??wij?......?............?00...iAK??jAK??wij?? ?

猜想后半部分是否等于 Y T W Y Y^TWY YTWY 验证一下:
Y T W Y 维度是 K × K 维 Y T W Y = [ y 1 , . . . y N ] [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] [ y 1 T y 2 T . . . y N T ] = [ ∑ i = 1 N y i w i 1 , . . . , ∑ i = 1 N y i w N i ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N ∑ j = 1 N y i w i j y i T = ∑ i = 1 N ∑ j = 1 N y i y i T w i j = [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} YTWY维度是K×KYTWY=[y1?,...yN?] ?w11?w21?...wN1??w12?w22?...wN2??............?w1N?w2N?...wNN?? ? ?y1T?y2T?...yNT?? ?=[i=1N?yi?wi1?,...,i=1N?yi?wNi?] ?y1T?y2T?...yNT?? ?=i=1N?j=1N?yi?wij?yiT?=i=1N?j=1N?yi?yiT?wij?= ?iA1??jA1??wij?iA2??jA1??wij?...iAK??jA1??wij??iA1??jA2??wij?iA2??jA2??wij?...iAK??jA2??wij??............?iA1??jAK??wij?iA2??jAK??wij?...iAK??jAK??wij?? ?
观察上式和O的后半部分:
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\ ?iA1??jA1??wij?0...0?0iA2??jA2??wij?......?............?00...iAK??jAK??wij?? ?

[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} ?iA1??jA1??wij?iA2??jA1??wij?...iAK??jA1??wij??iA1??jA2??wij?iA2??jA2??wij?...iAK??jA2??wij??............?iA1??jAK??wij?iA2??jAK??wij?...iAK??jAK??wij?? ?

我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果

O ′ = Y T D Y ? Y T W Y O'=Y^TDY - Y^TWY O=YTDY?YTWY 那么 O ′ P O'P OP相当于对 O ′ O' O的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(OP)

至此我们解出了 O O O ,并且提出了用 O ′ O' O 代替 O O O 可以达到同样的目的。
O ′ = Y T D Y ? Y T W Y O'=Y^TDY-Y^TWY O=YTDY?YTWY

我们最终的优化问题变为:
Y ^ = a r g m i n Y ^ ? T r ( Y T ( D ? W ) Y ( Y T D Y ) ? 1 ) = Y ^ = a r g m i n Y ^ ? T r ( Y T L Y ( Y T D Y ) ? 1 ) 这里 L = D ? W 是拉普拉斯矩阵 \hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵 Y^=argY^min??Tr(YT(D?W)Y(YTDY)?1)=Y^=argY^min??Tr(YTLY(YTDY)?1)这里L=D?W是拉普拉斯矩阵

To minimaze T r ( Y T L Y ( Y T D Y ) ? 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)?1)

T r ( Y T L Y ( Y T D Y ) ? 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)?1)

其中 Y ∈ R N × K Y∈R^{N×K} YRN×K ,每一行是ONE-HOT,表示第i行属于哪一类。 Y Y Y 形如:
[ 0 . . . 0 1 0 . . . 0 0 . . . 1 0 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 0 0 . . . 1 1 . . . 0 1 0 . . . 0 ] \begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix} ?00...01?...............?01...00?10...01?00...00?...............?00...10? ?
记:
P = Y T D Y = d i a g ( ∑ i ∈ A 1 d i , ∑ i ∈ A 2 d i , . . . , ∑ i ∈ A K d i ) = d i a g ( p 1 , p 2 , . . . , p k ) 原式 = T r ( Y T L Y P ? 1 ) = T r ( Y T L Y P ? 1 2 P ? 1 2 ) = T r ( P ? 1 2 Y T L Y P ? 1 2 ) P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12}) P=YTDY=diag(iA1??di?,iA2??di?,...,iAK??di?)=diag(p1?,p2?,...,pk?)原式=Tr(YTLYP?1)=Tr(YTLYP?21?P?21?)=Tr(P?21?YTLYP?21?)
记:
H = Y P ? 1 2 , H T = P ? 1 2 Y T H T H = P ? 1 2 Y T Y P ? 1 2 = P ? 1 2 I P ? 1 2 = P ? 1 H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1} H=YP?21?,HT=P?21?YTHTH=P?21?YTYP?21?=P?21?IP?21?=P?1

原式 = T r ( H T L H ) 原式=Tr(H^TLH) 原式=Tr(HTLH)
定理1:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1?λ2?...λn?

? 特征基(eigbasis): { v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1?,v2?,...,vn?} →Orthonormal,标准正交化之后的特征向量

x ∈ R N , a n d ?? x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} xRN,and??xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v  ̄ 1 \mathbf{x}=\overline v_1 x=v1? 时取到。

proof:
x 可以用 e i g b a s i s 表示 , ?因为 e i g b a s i s 是 o r t h o n o r m a l x = c 1 v  ̄ 1 + c 2 v  ̄ 2 + . . . + c n v  ̄ n L x = λ x = c 1 λ 1 v  ̄ 1 + c 2 λ 2 v  ̄ 2 + . . . + c n λ n v  ̄ n → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n 因为 x T x = 1 → c 1 2 + c 2 2 + . . . + c n 2 → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n ≥ λ 1 当 c 1 2 = 1 , c i = 0 , i ≠ 1 时等号成立 ? x = v  ̄ 1 ?? o r ?? x = ? v  ̄ 1 \mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立?\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1 x可以用eigbasis表示,?因为eigbasisorthonormalx=c1?v1?+c2?v2?+...+cn?vn?Lx=λx=c1?λ1?v1?+c2?λ2?v2?+...+cn?λn?vn?xTLx=c12?λ1?+c22?λ2?+...+cn2?λn?因为xTx=1c12?+c22?+...+cn2?xTLx=c12?λ1?+c22?λ2?+...+cn2?λn?λ1?c12?=1,ci?=0,i=1时等号成立?x=v1???or??x=?v1?
定理2:

对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0λ1?λ2?...λn?

? 特征基(eigbasis): { v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1?,v2?,...,vn?} →Orthonormal,标准正交化之后的特征向量

F ∈ R N × K , ? a n d ? F T F = I F∈R^{N×K},\ and\ F^TF=I FRN×K,?and?FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v  ̄ 1 , v  ̄ 2 , . . . , v  ̄ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1?,v2?,...,vK?] 时取到

proof:
D e n o t e ??? F = [ f 1 , f 2 , . . . , f K ] T r ( F T L F ) = ∑ i = 1 K f i T L f i 由于定理 2 ??? f 1 = v  ̄ 1 , f 2 = v  ̄ 2 , . . . , f n = v  ̄ n 时 , T r ( F T L F ) 最小 Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小 Denote???F=[f1?,f2?,...,fK?]Tr(FTLF)=i=1K?fiT?Lfi?由于定理2???f1?=v1?,f2?=v2?,...,fn?=vn?,Tr(FTLF)最小
因为 F T F = I F^TF=I FTF=I,所以F是orthonormal matrix,故不能每列都是 v  ̄ 1 \overline v_1 v1?

原始优化问题 T r ( H T L H ) Tr(H^TLH) Tr(HTLH) 并没有 H T H = I H^TH=I HTH=I的性质,无法用定理2,于是对H做一些变换。

H T D H = P ? 1 2 Y T D Y P ? 1 2 = P ? 1 2 P P ? 1 2 = I 记 F = D 1 2 H → F T F = ( D 1 2 H ) T D 1 2 H = H T D 1 2 D 1 2 H = H T D H = I 则 H = D ? 1 2 F → T r ( H T L H ) = T r ( F T D ? 1 2 L D ? 1 2 F ) , ???? F T F = I H^TDH=P^{-\frac12}Y^TDYP^{-\frac12}=P^{-\frac12}PP^{-\frac12}=I\\ 记F=D^{\frac12}H→F^TF=(D^{\frac12}H)^TD^{\frac12}H=H^TD^{\frac12}D^{\frac12}H=H^TDH=I\\ \\ 则H=D^{-\frac12}F\\ \\ →Tr(H^TLH)=Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I HTDH=P?21?YTDYP?21?=P?21?PP?21?=IF=D21?HFTF=(D21?H)TD21?H=HTD21?D21?H=HTDH=IH=D?21?FTr(HTLH)=Tr(FTD?21?LD?21?F),????FTF=I
至此我们得到最终的优化目标:
T r ( F T D ? 1 2 L D ? 1 2 F ) , ???? F T F = I Tr(F^TD^{-\frac12}LD^{-\frac12}F),\ \ \ \ F^TF=I Tr(FTD?21?LD?21?F),????FTF=I
在解出的F上再做一次k-means,最终求得Y

文章来源:https://blog.csdn.net/Gaowang_1/article/details/134945661
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。