当前位置：和泉文库 > 统计 > 浏览文档

《多元统计分析》课程教学资源（阅读材料）Statistical Classification

文件格式：PDF，文件大小：1.58MB，售价：13.22元

文档详细内容（约47页）

8 CHAPTER 2.BASIC DISCRIMINANTS Similarly,the conditional probability of classifying an observation x as m when in fact x Em2 is P(12)=P(xER I T2)=/f2(x)dx. (2.2) Often times,the costs of misclassification for the two populations are different;take for example testing for a fatal but easily curable disease.The cost of misclassifying a patient as healthy when he or she is sick in this case would be higher than misclassifying a healthy patient as sick.To account for such situations,we would like a classification method that would classify patients as healthy only when there is sufficient evidence to overcome a certain cost threshold.We define c(12)as the cost of misclassifying a data point in m when it is actually a member of 72,and similarly,c(21)as the cost of misclassification into m2 when x E m1. Another factor that can affect accurate classification is the prior probability of belonging to one or the other of the populations.Again referring to the above example,if said disease has a low prevalence,even a test with a high sensitivity will classify many patients as sick when they are not when administered to enough people,making it difficult to determine whether a positive test actually indicates a sick patient.Some sort of weight ought to be given to the prior probability that a random observation x is from m or 72,and we denote such probabilities as pi and p2 respectively. Note that with our prior probabilities,we can find the overall probabilities of misclassification by substituting our priors into the earlier conditional probabilities: P(observation comes from m and is misclassified as T2) (2.3) =P(x∈R2|T1)P(1) (2.4) =P(21)×p1, (2.5) and P(observation comes from m2 and is misclassified as m1) (2.6) =P(x∈R1|π2)P(2) (2.7) =P(12)×p2: (2.8) Then,the expected cost of misclassification (ECM)is given by multiplying the cost of misclassification by the overall probability of misclassification for each population: ECM=c(21)P(2|1)P1+c(12)P(12)p. (2.9) One criterion for a classification method is to minimize the ECM,which leads us to the following result: Theorem 2.1.1.(Johnson and Wichern,(4)The regions R1 and R2 that minimize the ECM are defined by the values of x for which the following inequalities hold: R1: (回>12× (五>c(21)p1 2 (2.10) R2: fi() c《12× 田<2× (2.11) Proof.Let us substitute the integral expressions for P(21)and P(12)given by (2.1)and(2.2)into the ECM: ECM c(21)p1 fi(x)dx+c(12)p2 f2(x)dx. (2.12) 2

8 CHAPTER 2. BASIC DISCRIMINANTS Similarly, the conditional probability of classifying an observation x as π1 when in fact x ∈ π2 is P(1|2) = P(x ∈ R1 | π2) = Z R1 f2(x)dx. (2.2) Often times, the costs of misclassification for the two populations are different; take for example testing for a fatal but easily curable disease. The cost of misclassifying a patient as healthy when he or she is sick in this case would be higher than misclassifying a healthy patient as sick. To account for such situations, we would like a classification method that would classify patients as healthy only when there is sufficient evidence to overcome a certain cost threshold. We define c(1|2) as the cost of misclassifying a data point in π1 when it is actually a member of π2, and similarly, c(2|1) as the cost of misclassification into π2 when x ∈ π1. Another factor that can affect accurate classification is the prior probability of belonging to one or the other of the populations. Again referring to the above example, if said disease has a low prevalence, even a test with a high sensitivity will classify many patients as sick when they are not when administered to enough people, making it difficult to determine whether a positive test actually indicates a sick patient. Some sort of weight ought to be given to the prior probability that a random observation x is from π1 or π2, and we denote such probabilities as p1 and p2 respectively. Note that with our prior probabilities, we can find the overall probabilities of misclassification by substituting our priors into the earlier conditional probabilities: P(observation comes from π1 and is misclassified as π2) (2.3) = P(x ∈ R2 | π1)P(π1) (2.4) = P(2|1) × p1, (2.5) and P(observation comes from π2 and is misclassified as π1) (2.6) = P(x ∈ R1 | π2)P(π2) (2.7) = P(1|2) × p2. (2.8) Then, the expected cost of misclassification (ECM) is given by multiplying the cost of misclassification by the overall probability of misclassification for each population: ECM = c(2|1)P(2|1)p1 + c(1|2)P(1|2)p1. (2.9) One criterion for a classification method is to minimize the ECM, which leads us to the following result: Theorem 2.1.1. (Johnson and Wichern, [4]) The regions R1 and R2 that minimize the ECM are defined by the values of x for which the following inequalities hold: R1 : f1(x) f2(x) > c(1|2) c(2|1) × p2 p1 , (2.10) R2 : f1(x) f2(x) < c(1|2) c(2|1) × p2 p1 . (2.11) Proof. Let us substitute the integral expressions for P(2|1) and P(1|2) given by (2.1) and (2.2) into the ECM: ECM = c(2|1)p1 Z R2 f1(x)dx + c(1|2)p2 Z R1 f2(x)dx. (2.12)

2.1. LINEAR DISCRIMINANT ANALYSIS FOR TWO POPULATIONS 9 Note that Ω = R1 + R2 + R3, so 1 = Z Ω f1(x)dx = Z R1 f1(x)dx + Z R2 f1(x)dx + Z R3 f1(x)dx. (2.13) Since we know that f1(x) 6= f2(x), R3 must be a union of distinct points, meaning R R3 f1(x)dx = 0, leaving us with 1 = Z R1 f1(x)dx + Z R2 f1(x)dx. (2.14) Plugging (2.14) into (2.12), we see ECM = c(2|1)p1 " 1 − Z R1 f1(x)dx # + c(1|2)p2 Z R1 f2(x)dx (2.15) ⇒ ECM = Z R1 " c(1|2)p2f2(x) − c(2|1)p1f1(x) # dx + c(2|1)p1. (2.16) Recall that p1, p2, c(1|2), c(2|1) are all positive since they are all probabilities and f1(x) and f2(x) are both positive functions. Then by inspection, we can see that the ECM is minimized when R1 is defined by those x such that [c(1|2)p2f2(x) − c(2|1)p1f1(x)] ≤ 0. However, if in (2.13) we had decided to use f2(x) instead of f1(x), (2.16) would have been ECM = Z R2 " c(2|1)p1f1(x) − c(1|2)p2f2(x) # dx + c(1|2)p2, leading to the definition of R2 as the set {x | [c(2|1)p1f1(x) − c(1|2)p2f2(x)] ≤ 0}. As those x that satisfy c(1|2)p2f2(x) = c(2|1)p1f1(x) can therefore be defined as both R1 and R2, we require that the inequalities in Theorem (2.1.1) defining the classification regions be strict and denote the case of equality as unclassifiable. (Note that unclassifiable points happen with probability zero.) The decision function given in Theorem (2.1.1) compares the probability ratio to the cost ratio and prior probability. The use of ratios is important as often times, it is much easier estimate the cost ratio than each cost explicitly. For example, if we are considering the costs of an state university of educating an eventual dropout versus the costs of not educating a eventual graduate, the former can be estimated with the school taxes and tuition, but the latter is more difficult to gauge. However, it could still be accurate to predict that such a cost ratio might be 5:1 or so. Let us examine the case where fi(x) are multivariate normal densities with known mean vectors µi and covariance matrices Σi . 2.1.1 Classification of Normal Populations when Σ1 = Σ2 = Σ Suppose that the density functions for f1(x) and f2(x) for population π1 and π2 are given by fi(x) = 1 (2π) m 2 |Σ| 1 2 exp" − 1 2 (x − µi) T Σ −1 (x − µi) # for i = 1, 2, (2.17)

点击进入文档下载页（PDF格式）

共47页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录