当前位置：和泉文库 > 统计 > 《多元统计分析》课程教学资源（阅读材料）EM algorithm

《多元统计分析》课程教学资源（阅读材料）EM algorithm

文件格式：PDF，文件大小：171.09KB，售价：8.14元

文档详细内容（约28页）

Statistics 580 The EM Algorithm Introduction The EM algorithm is a very general iterative algorithm for parameter estimation by maximum likelihood when some of the random variables involved are not observed i.e., considered missing or incomplete. The EM algorithm formalizes an intuitive idea for obtaining parameter estimates when some of the data are missing: i. replace missing values by estimated values, ii. estimate parameters. iii. Repeat • step (i) using estimated parameter values as true values, and • step (ii) using estimated values as “observed” values, iterating until convergence. This idea has been in use for many years before Orchard and Woodbury (1972) in their missing information principle provided the theoretical foundation of the underlying idea. The term EM was introduced in Dempster, Laird, and Rubin (1977) where proof of general results about the behavior of the algorithm was first given as well as a large number of applications. For this discussion, let us suppose that we have a random vector y whose joint density f(y; θ) is indexed by a p-dimensional parameter θ ∈ Θ ⊆ Rp . If the complete-data vector y were observed, it is of interest to compute the maximum likelihood estimate of θ based on the distribution of y. The log-likelihood function of y log L(θ; y) = `(θ; y) = log f(y; θ), is then required to be maximized. In the presence of missing data, however, only a function of the complete-data vector y, is observed. We will denote this by expressing y as (yobs, ymis), where yobs denotes the observed but “incomplete” data and ymis denotes the unobserved or “missing” data. For simplicity of description, assume that the missing data are missing at random (Rubin, 1976), so that f(y; θ) = f(yobs, ymis; θ) = f1(yobs; θ) · f2(ymis|yobs; θ), where f1 is the joint density of yobs and f2 is the joint density of ymis given the observed data yobs, respectively. Thus it follows that `obs(θ; yobs) = `(θ; y) − log f2(ymis|yobs; θ), where `obs(θ; yobs) is the observed-data log-likelihood. 1

EM algorithm is useful when maximizing lobs can be difficult but maximizing the complete- data log-likelihood e is simple.However,since y is not observed,e cannot be evaluated and hence maximized.The EM algorithm attempts to maximize ((0;y)iteratively,by replacing it by its conditional expectation given the observed data yobs.This expectation is computed with respect to the distribution of the complete-data evaluated at the current estimate of 0. More specifically,if (is an initial value for 0,then on the first iteration it is required to compute Q:0)=Ego [(0;y)lyobe] Q(0;0(0))is now maximized with respect to 0,that is,00)is found such that Q(01:8o)≥Q(0:0o) for all 0 ee.Thus the EM algorithm consists of an E-step (Estimation step)followed by an M-step (Maximization step)defined as follows: E-step:Compute Q(;0())where Q(0:0)=Eg)[e(0;y)lybs] M-step:Find (+1)in such that Q(0t+1:0)≥Q(0:0) for all0∈日 The E-step and the M-step are repeated alternately until the difference L(()-L(()) is less than 6,where 6 is a prescribed small quantity. The computation of these two steps simplify a great deal when it can be shown that the log-likelihood is linear in the sufficient statistic for 0.In particular,this turns out to be the case when the distribution of the complete-data vector (i.e.,y)belongs to the exponential family.In this case,the E-step reduces to computing the expectation of the complete- data sufficient statistic given the observed data.When the complete-data are from the exponential family,the M-step also simplifies.The M-step involves maximizing the expected log-likelihood computed in the E-step.In the exponential family case,actually maximizing the expected log-likelihood to obtain the next iterate can be avoided.Instead,the conditional expectations of the sufficient statistics computed in the E-step can be directly substituted for the sufficient statistics that occur in the expressions obtained for the complete-data maximum likelihood estimators of 0,to obtain the next iterate.Several examples are discussed below to illustrate these steps in the exponential family case. As a general algorithm available for complex maximum likelihood computations,the EM algorithm has several appealing properties relative to other iterative algorithms such as Newton-Raphson.First,it is typically easily implemented because it relies on complete- data computations:the E-step of each iteration only involves taking expectations over complete-data conditional distributions.The M-step of each iteration only requires complete- data maximum likelihood estimation,for which simple closed form expressions are already 2

EM algorithm is useful when maximizing `obs can be difficult but maximizing the completedata log-likelihood ` is simple. However, since y is not observed, ` cannot be evaluated and hence maximized. The EM algorithm attempts to maximize `(θ; y) iteratively, by replacing it by its conditional expectation given the observed data yobs. This expectation is computed with respect to the distribution of the complete-data evaluated at the current estimate of θ. More specifically, if θ (0) is an initial value for θ, then on the first iteration it is required to compute Q(θ; θ (0)) = Eθ (0) [`(θ; y)|yobs] . Q(θ; θ (0)) is now maximized with respect to θ, that is, θ (1) is found such that Q(θ (1); θ (0)) ≥ Q(θ; θ (0)) for all θ ∈ Θ. Thus the EM algorithm consists of an E-step (Estimation step) followed by an M-step (Maximization step) defined as follows: E-step: Compute Q(θ; θ (t) ) where Q(θ; θ (t) ) = Eθ (t) [`(θ; y)|yobs] . M-step: Find θ (t+1) in Θ such that Q(θ (t+1); θ (t) ) ≥ Q(θ; θ (t) ) for all θ ∈ Θ. The E-step and the M-step are repeated alternately until the difference L(θ (t+1)) − L(θ (t) ) is less than δ, where δ is a prescribed small quantity. The computation of these two steps simplify a great deal when it can be shown that the log-likelihood is linear in the sufficient statistic for θ. In particular, this turns out to be the case when the distribution of the complete-data vector (i.e., y) belongs to the exponential family. In this case, the E-step reduces to computing the expectation of the completedata sufficient statistic given the observed data. When the complete-data are from the exponential family, the M-step also simplifies. The M-step involves maximizing the expected log-likelihood computed in the E-step. In the exponential family case, actually maximizing the expected log-likelihood to obtain the next iterate can be avoided. Instead, the conditional expectations of the sufficient statistics computed in the E-step can be directly substituted for the sufficient statistics that occur in the expressions obtained for the complete-data maximum likelihood estimators of θ, to obtain the next iterate. Several examples are discussed below to illustrate these steps in the exponential family case. As a general algorithm available for complex maximum likelihood computations, the EM algorithm has several appealing properties relative to other iterative algorithms such as Newton-Raphson. First, it is typically easily implemented because it relies on completedata computations: the E-step of each iteration only involves taking expectations over complete-data conditional distributions. The M-step of each iteration only requires completedata maximum likelihood estimation, for which simple closed form expressions are already 2

available. Secondly, it is numerically stable: each iteration is required to increase the loglikelihood `(θ; yobs) in each iteration, and if `(θ; yobs) is bounded, the sequence `(θ (t) ; yobs) converges to a stationery value. If the sequence θ (t) converges, it does so to a local maximum or saddle point of `(θ; yobs) and to the unique MLE if `(θ; yobs) is unimodal. A disadvantage of EM is that its rate of convergence can be extremely slow if a lot of data are missing: Dempster, Laird, and Rubin (1977) show that convergence is linear with rate proportional to the fraction of information about θ in `(θ; y) that is observed. Example 1: Univariate Normal Sample Let the complete-data vector y = (y1, . . . , yn) T be a random sample from N(µ, σ 2 ). Then f(y; µ, σ 2 ) = 1 2πσ 2 n/2 exp ( − 1 2 Xn i=1 (yi − µ) 2 σ 2 ) = 1 2πσ 2 n/2 exp n −1/2σ 2 X y 2 i − 2µ X yi + nµ2 o which implies that ( P yi , P y 2 i ) are sufficient statistics for θ = (µ, σ 2 ) T . The complete-data log-likelihood function is: `(µ, σ 2 ; y) = − n 2 log(σ 2 ) − 1 2 Xn i=1 (yi − µ) 2 σ 2 + constant = − n 2 log(σ 2 ) − 1 2σ 2 Xn i=1 y 2 i + µ σ 2 Xn i=1 yi − nµ2 σ 2 + constant It follows that the log-likelihood based on complete-data is linear in complete-data sufficient statistics. Suppose yi ,i = 1, . . . , m are observed and yi ,i = m + 1, . . . , n are missing (at random) where yi are assumed to be i.i.d. N(µ, σ 2 ). Denote the observed data vector by yobs = (y1, . . . , ym) T ). Since the complete-data y is from the exponential family, the E-step requires the computation of Eθ Xn i=1 yi |yobs! and Eθ X N i=1 y 2 i |yobs! , instead of computing the expectation of the complete-data log-likelihood function shown above. Thus, at the t th iteration of the E-step, compute s (t) 1 = E µ(t) ,σ2 (t) Xn i=1 yi |yobs! (1) = Xm i=1 yi + (n − m) µ (t) since E µ(t) ,σ2 (t) (yi) = µ (t) where µ (t) and σ 2 (t) are the current estimates of µ and σ 2 , and 3

s (t) 2 = E µ(t) ,σ2 (t) Xn i=1 y 2 i |yobs! (2) = Xm i=1 y 2 i + (n − m) h σ (t) 2 + µ (t) 2 i since E µ(t) ,σ2 (t) (y 2 i ) = σ 2 (t) + µ (t) 2 . For the M-step, first note that the complete-data maximum likelihood estimates of µ and σ 2 are: µˆ = Pn i=1 yi n and σˆ 2 = Pn i=1 y 2 i n − Pn i=1 yi n !2 The M-step is defined by substituting the expectations computed in the E-step for the complete-data sufficient statistics on the right-hand side of the above expressions to obtain expressions for the new iterates of µ and σ 2 . Note that complete-data sufficient statistics themselves cannot be computed directly since ym+1, . . . , yn have not been observed. We get the expressions µ (t+1) = s (t) 1 n (3) and σ 2 (t+1) = s (t) 2 n − µ (t+1)2 . (4) Thus, the E-step involves computing evaluating (1) and (2) beginning with starting values µ (0) and σ 2 (0) . M-step involves substituting these in (3) and (4) to calculate new values µ (1) and σ 2 (1) , etc. Thus, the EM algorithm iterates successively between (1) and (2) and (3) and (4). Of course, in this example, it is not necessary to use of EM algorithm since the maximum likelihood estimates for (µ, σ 2 ) are clearly given by µˆ = Pm i=1 yi/m and σˆ 2 = Pm i=1 y 2 i /m−µˆ 2✷. Example 2: Sampling from a Multinomial population In the Example 1, “incomplete data” in effect was “missing data” in the conventional sense. However, in general, the EM algorithm applies to situations where the complete data may contain variables that are not observable by definition. In that set-up, the observed data can be viewed as some function or mapping from the space of the complete data. The following example is used by Dempster, Laird and Rubin (1977) as an illustration of the EM algorithm. Let yobs = (38, 34, 125)T be observed counts from a multinomial population with probabilities: ( 1 2 − 1 2 θ, 1 4 θ, 1 2 + 1 4 θ). The objective is to obtain the maximum likelihood estimate of θ. First, to put this into the framework of an incomplete data problem, define y = (y1, y2, y3, y4) T with multinomial probabilities ( 1 2 − 1 2 θ, 1 4 θ, 1 4 θ, 1 2 ) ≡ (p1, p2, p3, p4). 4

点击进入文档下载页（PDF格式）

共28页，可试读10页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录