to discard the discreteness condition and instead allow that fi takes arbitrary values in R.This leads to the relaxed optimization problem re f'Lf subject to ff=vn. (4) By the Rayleigh-Ritz theorem (e.g.,see Section 5.5.2.of Luitkepohl,1997)it can be seen immediately that the solution of this problem is given by the vector f which is the eigenvector corresponding to the second smallest eigenvalue of L(recall that the smallest eigenvalue of L is 0 with eigenvector 1). So we can approximate a minimizer of RatioCut by the second eigenvector of L.However,in order to obtain a partition of the graph we need to re-transform the real-valued solution vector f of the relaxed problem into a discrete indicator vector.The simplest way to do this is to use the sign of f as indicator function,that is to choose ∈Aiff≥0 Effi<0. However,in particular in the case of k>2 treated below,this heuristic is too simple.What most spectral clustering algorithms do instead is to consider the coordinates fi as points in R and cluster them into two groups C,C by the k-means clustering algorithm.Then we carry over the resulting clustering to the underlying data points,that is we choose ∈A if fi∈C ∈Aiff∈C. This is exactly the unnormalized spectral clustering algorithm for the case of k=2. 5.2 Approximating RatioCut for arbitrary k The relaxation of the RatioCut minimization problem in the case of a general value k follows a similar principle as the one above.Given a partition of V into k sets A1,...,Ak,we define k indicator vectors hi=(h1.j,...,hn.i)'by hi.j J1/4 if∈A (i=1,,n;j=1,.,) (5) 00 otherwise Then we set the matrix HE Rnxk as the matrix containing those k indicator vectors as columns. Observe that the columns in H are orthonormal to each other,that is H'H=I.Similar to the calculations in the last section we can see that hLh =cut(A.A) A Moreover,one can check that h;Lhi (H'LH)ii. Combining those facts we get RatioCut(A,,Ak)=∑Lh:=∑(H'LH)i=Tr(H'LH, 11
to discard the discreteness condition and instead allow that fi takes arbitrary values in ❘. This leads to the relaxed optimization problem min f∈❘n f 0Lf subject to f ⊥ ✶, kfk = √ n. (4) By the Rayleigh-Ritz theorem (e.g., see Section 5.5.2. of L¨utkepohl, 1997) it can be seen immediately that the solution of this problem is given by the vector f which is the eigenvector corresponding to the second smallest eigenvalue of L (recall that the smallest eigenvalue of L is 0 with eigenvector ✶). So we can approximate a minimizer of RatioCut by the second eigenvector of L. However, in order to obtain a partition of the graph we need to re-transform the real-valued solution vector f of the relaxed problem into a discrete indicator vector. The simplest way to do this is to use the sign of f as indicator function, that is to choose ( vi ∈ A if fi ≥ 0 vi ∈ A if fi < 0. However, in particular in the case of k > 2 treated below, this heuristic is too simple. What most spectral clustering algorithms do instead is to consider the coordinates fi as points in ❘ and cluster them into two groups C, C by the k-means clustering algorithm. Then we carry over the resulting clustering to the underlying data points, that is we choose ( vi ∈ A if fi ∈ C vi ∈ A if fi ∈ C. This is exactly the unnormalized spectral clustering algorithm for the case of k = 2. 5.2 Approximating RatioCut for arbitrary k The relaxation of the RatioCut minimization problem in the case of a general value k follows a similar principle as the one above. Given a partition of V into k sets A1, . . . , Ak, we define k indicator vectors hj = (h1,j , . . . , hn,j ) 0 by hi,j = ( 1/ p |Aj | if vi ∈ Aj 0 otherwise (i = 1, . . . , n; j = 1, . . . , k). (5) Then we set the matrix H ∈ ❘ n×k as the matrix containing those k indicator vectors as columns. Observe that the columns in H are orthonormal to each other, that is H0H = I. Similar to the calculations in the last section we can see that h 0 iLhi = cut(Ai , Ai) |Ai | . Moreover, one can check that h 0 iLhi = (H0LH)ii. Combining those facts we get RatioCut(A1, . . . , Ak) = X k i=1 h 0 iLhi = X k i=1 (H0LH)ii = Tr(H0LH), 11
where Tr denotes the trace of a matrix.So the problem of minimizing RatioCut(A1,...,Ak)can be rewritten as min Tr(H'LH)subject to H'H I,H as defined in Eq.(5). A1:...Ak Similar to above we now relax the problem by allowing the entries of the matrix H to take arbitrary real values.Then the relaxed problem becomes: in Tr(H'LH)subject to H'H=1. This is the standard form of a trace minimization problem,and again a version of the Rayleigh-Ritz theorem (e.g.,see Section 5.2.2.(6)of Luitkepohl,1997)tells us that the solution is given by choosing H as the matrix which contains the first k eigenvectors of L as columns.We can see that the matrix H is in fact the matrix U used in the unnormalized spectral clustering algorithm as described in Section 4.Again we need to re-convert the real valued solution matrix to a discrete partition.As above,the standard way is to use the k-means algorithms on the rows of U.This leads to the general unnormalized spectral clustering algorithm as presented in Section 4. 5.3 Approximating Ncut Techniques very similar to the ones used for RatioCut can be used to derive normalized spectral clustering as relaxation of minimizing Ncut.In the case k=2 we define the cluster indicator vector f by vol(A) ifvi∈A vol A vol(A) (6) vol(A) ifvi∈A. Similar to above one can check that (Df)'1=0,f'Df vol(V),and f'Lf vol(V)Ncut(A,A).Thus we can rewrite the problem of minimizing Ncut by the equivalent problem min f'Lf subject to f as in (6),Df L1,fDf vol(V). (7) Again we relax the problem by allowing f to take arbitrary real values: re f'Lf subject to Df 11,f'Df vol(V). (8) Now we substitute g:=D1/2f.After substitution,the problem is g subjeet togvol(V). (9) Observe that D-1/2LD-1/2-Lsym,D/21 is the first eigenvector of Lsym,and vol(V)is a constant. Hence,Problem(9)is in the form of the standard Rayleigh-Ritz theorem,and its solution g is given by the second eigenvector of Lsym.Re-substituting f=D-1/2g and using Proposition 3 we see that f is the second eigenvector of Lrw,or equivalently the generalized eigenvector of Lu =ADu. For the case of finding k>2 clusters,we define the indicator vectors hj=(h1.j,...,hn)by hij 1/vvol(Aj)if v:E Aj (i=1,,n;j=1,,) (10) 10 otherwise 12
where Tr denotes the trace of a matrix. So the problem of minimizing RatioCut(A1, . . . , Ak) can be rewritten as min A1,...,Ak Tr(H0LH) subject to H0H = I, H as defined in Eq. (5). Similar to above we now relax the problem by allowing the entries of the matrix H to take arbitrary real values. Then the relaxed problem becomes: min H∈❘n×k Tr(H0LH) subject to H0H = I. This is the standard form of a trace minimization problem, and again a version of the Rayleigh-Ritz theorem (e.g., see Section 5.2.2.(6) of L¨utkepohl, 1997) tells us that the solution is given by choosing H as the matrix which contains the first k eigenvectors of L as columns. We can see that the matrix H is in fact the matrix U used in the unnormalized spectral clustering algorithm as described in Section 4. Again we need to re-convert the real valued solution matrix to a discrete partition. As above, the standard way is to use the k-means algorithms on the rows of U. This leads to the general unnormalized spectral clustering algorithm as presented in Section 4. 5.3 Approximating Ncut Techniques very similar to the ones used for RatioCut can be used to derive normalized spectral clustering as relaxation of minimizing Ncut. In the case k = 2 we define the cluster indicator vector f by fi = q vol(A) vol A if vi ∈ A − qvol(A) vol(A) if vi ∈ A. (6) Similar to above one can check that (Df) 0✶ = 0, f 0Df = vol(V ), and f 0Lf = vol(V ) Ncut(A, A). Thus we can rewrite the problem of minimizing Ncut by the equivalent problem min A f 0Lf subject to f as in (6), Df ⊥ ✶, f0Df = vol(V ). (7) Again we relax the problem by allowing f to take arbitrary real values: min f∈❘n f 0Lf subject to Df ⊥ ✶, f0Df = vol(V ). (8) Now we substitute g := D1/2f. After substitution, the problem is min g∈❘n g 0D−1/2LD−1/2 g subject to g ⊥ D1/2✶, kgk 2 = vol(V ). (9) Observe that D−1/2LD−1/2 = Lsym, D1/2✶ is the first eigenvector of Lsym, and vol(V ) is a constant. Hence, Problem (9) is in the form of the standard Rayleigh-Ritz theorem, and its solution g is given by the second eigenvector of Lsym. Re-substituting f = D−1/2 g and using Proposition 3 we see that f is the second eigenvector of Lrw, or equivalently the generalized eigenvector of Lu = λDu. For the case of finding k > 2 clusters, we define the indicator vectors hj = (h1,j , . . . , hn,j ) 0 by hi,j = ( 1/ p vol(Aj ) if vi ∈ Aj 0 otherwise (i = 1, . . . , n; j = 1, . . . , k). (10) 12