当前位置：和泉文库 > 统计 > 浏览文档

《多元统计分析》课程教学资源（阅读材料）Cluster Validation

文件格式：PDF，文件大小：172.29KB，售价：5.61元

文档详细内容（约19页）

3.3 Fowlkes-Mallows Index Fowlkes and Mallows introduced their index as a measure for comparing hierarchical clusterings2 [4].However,it can also be used for flat clusterings since it consists in calculating an index Bi for each level i=2,...,n-1 of the hierarchies in consideration and plotting Bi against i.The measure B is easily generalized to a measure for clusterings with different numbers of clusters.The generalized Fowlkes-Mallows Index is defined by FM(C,C')= ∑1f=1m喝-n n11 V(∑：lCP-n)(∑，lCgP-n)Vm1+no)m1+no In the context of Information Retrieval this measure can be interpreted as the geometric mean of precision (ratio of the number of retrieved relevant docmnsto the total mber of retrieved docmentand recall (ratio of the number of retrieved relevant documents to the total number of relevant documents n11) n11十n01 Like for the adjusted Rand Index,the "amount"of similarity of two clus- terings corresponds to the deviation from the expected value under the null hypothesis of independant clusterings with fixed cluster sizes.Again,the strong assumptions on the distribution make the result hard to interpret. Futhermore,this measure has the undesirable property that for small num- bers of clusters,the value is very high,even for independant clusterings (which even achieve the maximum value for small numbers of clusters).Wal- lace proposed to attenuate this effect by substracting the number of pairs whose match is forced by the cluster overlaps from the number of"good" pairs and from the number of all pairs [9]. 3.4 Mirkin Metric The Mirkin Metric which is also known as Equivalence Mismatch Distance [11]is defined by MC,C=】 It corresponds to the Hamming distance for binary vectors if the set of all pairs of elements is enumerated and a clustering is represented by a 2A hierarchical clustering of a set X is a hierarchy ofXI clusterings,with the two trivial clusterings at the top and bottom,respectively,and each level of the hierarchy is a refinement of all the levels above. 6

3.3 Fowlkes–Mallows Index Fowlkes and Mallows introduced their index as a measure for comparing hierarchical clusterings2 [4]. However, it can also be used for flat clusterings since it consists in calculating an index Bi for each level i = 2, . . . , n − 1 of the hierarchies in consideration and plotting Bi against i. The measure Bi is easily generalized to a measure for clusterings with different numbers of clusters. The generalized Fowlkes–Mallows Index is defined by FM(C, C 0 ) = Pk i=1 P` j=1 m2 ij − n q ( P i |Ci | 2 − n)(P j |C0 j | 2 − n) = p n11 (n11 + n10)(n11 + n01) In the context of Information Retrieval this measure can be interpreted as the geometric mean of precision (ratio of the number of retrieved relevant documents to the total number of retrieved documents = n11 n11+n10 ) and recall (ratio of the number of retrieved relevant documents to the total number of relevant documents = n11 n11+n01 ). Like for the adjusted Rand Index, the ”amount” of similarity of two clusterings corresponds to the deviation from the expected value under the null hypothesis of independant clusterings with fixed cluster sizes. Again, the strong assumptions on the distribution make the result hard to interpret. Futhermore, this measure has the undesirable property that for small numbers of clusters, the value is very high, even for independant clusterings (which even achieve the maximum value for small numbers of clusters). Wallace proposed to attenuate this effect by substracting the number of pairs whose match is forced by the cluster overlaps from the number of ”good” pairs and from the number of all pairs [9]. 3.4 Mirkin Metric The Mirkin Metric which is also known as Equivalence Mismatch Distance [11] is defined by M(C, C 0 ) = X k i=1 |Ci | 2 + X ` j=1 |C 0 j | 2 − 2 X k i=1 X l j=1 m2 ij . It corresponds to the Hamming distance for binary vectors if the set of all pairs of elements is enumerated and a clustering is represented by a 2A hierarchical clustering of a set X is a hierarchy of |X| clusterings, with the two trivial clusterings at the top and bottom, respectively, and each level of the hierarchy is a refinement of all the levels above. 6

点击进入文档下载页（PDF格式）

共19页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录