2 Section 8:Asymptotic Properties of the MLE In this part of the course,we will consider the asymptotic properties of the maximum likelihood estimator.In particular,we will study issues of consistency,asymptotic normality,and efficiency.Many of the proofs will be rigorous,to display more generally useful techniques also for later chapters. We suppose that Xn=(X1,...,Xn),where the Xi's are i.i.d.with common density p(x;fo)∈P={p(x;O):0∈Θ}.We assume that 0 o is identified in the sense that if0≠0oand0∈Θ,then p(x;0)p(x;00)with respect to the dominating measure u
2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency, asymptotic normality, and efficiency. Many of the proofs will be rigorous, to display more generally useful techniques also for later chapters. We suppose that Xn = (X1,...,Xn), where the Xi’s are i.i.d. with common density p(x; θ0) ∈ P = {p(x; θ) : θ ∈ Θ}. We assume that θ0 is identified in the sense that if θ = θ0 and θ ∈ Θ, then p(x; θ) = p(x; θ0) with respect to the dominating measure µ.
3 For fixed aee,the joint density of Xn is equal to the product of the individual densities,i.e., p(cn;0)=Πp(c;8) i=1 As usual,when we think of p(n;0)as a function of 0 with n held fixed,we refer to the resulting function as the likelihood function, L(0;n).The maximum likelihood estimate for observed xn is the value 0 which maximizes L(;),0(n).Prior to observation, xn is unknown,so we consider the marimum likelihood estimator, MLE,to be the value 0e which maximizes L(;Xn),0(Xn). Equivalently,the MLE can be taken to be the maximum of the standardized log-likelihood, 1(0;Xn)log L(0;Xn) 2 -2∑1ogp(x:0)=∑ (0:X) n n m m i=1 i=1
3 For fixed θ ∈ Θ, the joint density of Xn is equal to the product of the individual densities, i.e., p(xn; θ) = n i=1 p(xi; θ) As usual, when we think of p(xn; θ) as a function of θ with xn held fixed, we refer to the resulting function as the likelihood function, L(θ; xn). The maximum likelihood estimate for observed xn is the value θ ∈ Θ which maximizes L(θ; xn), ˆ θ(xn). Prior to observation, xn is unknown, so we consider the maximum likelihood estimator, MLE, to be the value θ ∈ Θ which maximizes L(θ; Xn), ˆ θ(Xn). Equivalently, the MLE can be taken to be the maximum of the standardized log-likelihood, l(θ; Xn) n = log L(θ; Xn) n = 1 n n i=1 log p(Xi; θ) = 1n n i=1 l(θ; Xi)
We will show that the MLE is often 1.consistent,0(n)0 2.asymptotically normal,())(Normal R.V. 3. asymptotically efficient,i.e.,if we want to estimate 0o by any other estimator within a "reasonable class,"the MLE is the most precise. To show 1-3,we will have to provide some regularity conditions on the probability model and (for 3)on the class of estimators that will be considered
4 We will show that the MLE is often 1. consistent, ˆ θ(Xn) P→ θ0 2. asymptotically normal, √n( ˆ θ(Xn) − θ0) D(θ0) → Normal R.V. 3. asymptotically efficient, i.e., if we want to estimate θ0 by any other estimator within a “reasonable class,” the MLE is the most precise. To show 1-3, we will have to provide some regularity conditions on the probability model and (for 3) on the class of estimators that will be considered
5 Section 8.1 Consistency We first want to show that if we have a sample of i.i.d.data from a common distribution which belongs to a probability model,then under some regularity conditions on the form of the density,the sequence of estimators,{(Xn)},will converge in probability to 00. So far,we have not discussed the issue of whether a maximum likelihood estimator exists or,if one does,whether it is unique.We will get to this,but first we start with a heuristic proof of consistency
5 Section 8.1 Consistency We first want to show that if we have a sample of i.i.d. data from a common distribution which belongs to a probability model, then under some regularity conditions on the form of the density, the sequence of estimators, { ˆ θ(Xn)}, will converge in probability to θ0. So far, we have not discussed the issue of whether a maximum likelihood estimator exists or, if one does, whether it is unique. We will get to this, but first we start with a heuristic proof of consistency
6 Heuristic Proof The MLE is the value 0e that maximizes Q(e:Xn):=是∑1l(0:X).By the WLLN,we know that Q(0:)=>1(0:X:)Qo(0):Eoo[l(0;X)] 2=1 Eoo[logp(X;0)] {logp(x;0)}p(x;0o)du(x) We expect that,on average,the log-likelihood will be close to the expected log-likelihood.Therefore,we expect that the maximum likelihood estimator will be close to the maximum of the expected log-likelihood.We will show that the expected log-likelihood,Qo(0) is maximized at 0o (i.e.,the truth)
6 Heuristic Proof The MLE is the value θ ∈ Θ that maximizes Q(θ; Xn) := 1n ni=1 l(θ; Xi). By the WLLN, we know that Q(θ; Xn) = 1 n n i=1 l(θ; Xi) P→ Q0(θ) := Eθ0 [l(θ; X)] = Eθ0 [log p(X; θ)] = {log p(x; θ)}p(x; θ0)dµ(x) We expect that, on average, the log-likelihood will be close to the expected log-likelihood. Therefore, we expect that the maximum likelihood estimator will be close to the maximum of the expected log-likelihood. We will show that the expected log-likelihood, Q0(θ) is maximized at θ0 (i.e., the truth).