An Inconsistent Maximum Likelihood Estimate THOMAS S.FERGUSON* An example is given of a family of distributions on [-1,then there exists a sequence of roots,0,of the likelihood 1]with a continuous one-dimensional parameterization equation. that joins the triangular distribution(when 0 =0)to the uniform (when 0=1),for which the maximum likelihood 31ogLn0)=0. a01 estimates exist and converge strongly to 0 1 as the sample size tends to infinity,whatever be the true value that converges in probability to 0o as no.Moreover, of the parameter.A modification that satisfies Cramer's any such sequence 0 is asymptotically normal and conditions is also given. asymptotically efficient.It is known that Cramer's theo- KEY WORDS:Maximum likelihood estimates;Incon- rem extends to the multiparameter case. To emphasize the point that this is a local result and sistency;Asymptotic efficiency;Mixtures. may have nothing to do with maximum likelihood esti- 1.INTRODUCTION mation,we consider the following well-known example, a special case of some quite practical problems mentioned There are many examples in the literature of estimation recently by Quandt and Ramsey (1978).Let the density problems for which the maximum likelihood principle f(x|0)be a mixture of two normals,N(0,1)and N(, does not yield a consistent sequence of estimates,notably o2),with mixing parameter, Neyman and Scott(1948),Basu (1955),Kraft and LeCam (1956),and Bahadur(1958).In this article a very simple f(x|μ,o)=克p(x)+是p(x-u)o)/o, example of inconsistency of the maximum likelihood where o is the density of the standard normal distribution, method is presented that shows clearly one danger to be and the parameter space is ={(u,o):o>0.It is clear wary of in an otherwise regular-looking situation.A re- that for any given sample,X1,...,Xn,from this density cent article by Berkson(1980)followed by a lively dis- the likelihood function can be made as large as desired cussion shows that there is still interest in these problems.by taking =X,say,and o sufficiently small.Never- The discussion in this article is centered on a sequence theless,Cramer's conditions are satisfied and so there of independent,identically distributed,and,for the sake exists a consistent asymptotically efficient sequence of of convenience,real random variables,X1,X2,..., roots of the likelihood equation even though maximum distributed according to a distribution,F(x0),for some likelihood estimates do not exist. 0 in a fixed parameter space It is assumed that there A more disturbing example is given by Kraft and is a o-finite measure with respect to which densities,f(x LeCam(1956),in which Cramer's conditions are satis- 0),exist for all 0 e 0.The maximum likelihood estimate fied,the maximum likelihood estimate exists,is unique, of 0 based on X1,...,Xn is a value,0n(x1,...,x)of and satisfies the likelihood equation,but is not consistent. 0∈⊙,if any,that maximizes the likelihood function In such examples,it is possible to find the asymptotically efficient sequence of roots of the likelihood equation by Ln(0)=Πfx|0) first finding a consistent extimate and then finding the i-1 closest root or improving by the method of scoring as in The maximum likelihood method of estimation goes back Rao(1965).See Lehmann(1980)for a discussion of these to Gauss,Edgeworth,and Fisher.For historical points, problems. see LeCam (1953)and Edwards (1972).For a general Other more practical examples of inconsistency in the survey of the area and a large bibliography,see Norton maximum likelihood method involve an infinite number (1972). of parameters.Neyman and Scott (1948)show that the The starting point of our discussion is the theorem of maximum likelihood estimate of the common variance of Cramer (1946,p.500),which states that under certain a sequence of normal populations with unknown means regularity conditions on the densities involved,if 0 is real based on a fixed sample size k taken from each population valued and if the true value 0o is an interior point of converges to a value lower than the true value as the number of populations tends to infinity.This example led directly to the paper of Kiefer and Wolfowitz (1956)on Thomas S.Ferguson is Professor,Department of Mathematics, the consistency and efficiency of the maximum likelihood University of California,Los Angeles,CA 90024.Research was sup- ported in part by the National Science Foundation under Grant MCS77- 2121.The author wishes to acknowledge the help of an exceptionally Journal of the American Statistical Association good referee whose very detailed comments benefited this article December 1982,Volume 77,Number 380 substantially. Theory and Methods Section 831
An Inconsistent Maximum Likelihood Estimate THOMAS S. FERGUSON* An example is given of a family of distributions on [ - 1, 1] with a continuous one-dimensional parameterization that joins the triangular distribution (when 0 = 0) to the uniform (when 0 = 1), for which the maximum likelihood estimates exist and converge strongly to 0 = 1 as the sample size tends to infinity, whatever be the true value of the parameter. A modification that satisfies Cramer's conditions is also given. KEY WORDS: Maximum likelihood estimates; Inconsistency; Asymptotic efficiency; Mixtures. 1. INTRODUCTION There are many examples in the literature of estimation problems for which the maximum likelihood principle does not yield a consistent sequence of estimates, notably Neyman and Scott (1948), Basu (1955), Kraft and LeCam (1956), and Bahadur (1958). In this article a very simple example of inconsistency of the maximum likelihood method is presented that shows clearly one danger to be wary of in an otherwise regular-looking situation. A recent article by Berkson (1980) followed by a lively discussion shows that there is still interest in these problems. The discussion in this article is centered on a sequence of independent, identically distributed, and, for the sake of convenience, real random variables, Xl, X2, . . distributed according to a distribution, F(x I 0), for some 0 in a fixed parameter space 0. It is assumed that there is a ur-finite measure with respect to which densities, f(x I 0), exist for all 0 E 0. The maximum likelihood estimate of 0 based on X1,. .., X is a value, On(x, . .., xn) of 0 E 0, if any, that maximizes the likelihood function n Ln (0) = H f(xi I 0) i = I The maximum likelihood method of estimation goes back to Gauss, Edgeworth, and Fisher. For historical points, see LeCam (1953) and Edwards ('972). For a general survey of the area and a large bibliography, see Norton (1972). The starting point of our discussion is the theorem of Cramer (1946, p. 500), which states that under certain regularity conditions on the densities involved, if 0 is real valued and if the true value 00 is an interior point of 0, * Thomas S. Ferguson is Professor, Department of Mathematics, University of California, Los Angeles, CA 90024. Research was supported in part by the National Science Foundation under Grant MCS77- 2121. The author wishes to acknowledge the help of an exceptionally good referee whose very detailed comments benefited this article substantially. then there exists a sequence of roots, 0), of the likelihood equation, -log Lnf(O) = 0, ao that converges in probability to Oo as n m. Moreover, any such sequence 0,, is asymptotically normal and asymptotically efficient. It is known that Cramer's theorem extends to the multiparameter case. To emphasize the point that this is a local result and may have nothing to do with maximum likelihood estimation, we consider the following well-known example, a special case of some quite practical problems mentioned recently by Quandt and Ramsey (1978). Let the density f(x I 0) be a mixture of two normals, N(O, 1) and N(i, (c2), with mixing parameter 2, f(x I P, a) = 2 p(x) + 2 ((- )Io)I, where 'p is the density of the standard normal distribution, and the parameter space is 0 = {(p, r): u > 0}. It is clear that for any given sample, XI, . . , X, from this density the likelihood function can be made as large as desired by taking 11 = XI, say, and r sufficiently small. Nevertheless, Cramer's conditions are satisfied and so there exists a consistent asymptotically efficient sequence of roots of the likelihood equation even though maximum likelihood estimates do not exist. A more disturbing example is given by Kraft and LeCam (1956), in which Cramer's conditions are satisfied, the maximum likelihood estimate exists, is unique, and satisfies the likelihood equation, but is not consistent. In such examples, it is possible to find the asymptotically efficient sequence of roots of the likelihood equation by first finding a consistent extimate and then finding the closest root or improving by the method of scoring as in Rao (1965). See Lehmann (1980) for a discussion of these problems. Other more practical examples of inconsistency in the maximum likelihood method involve an infinite number of parameters. Neyman and Scott (1948) show that the maximum likelihood estimate of the common variance of a sequence of normal populations with unknown means based on a fixed sample size k taken from each population converges to a value lower than the true value as the number of populations tends to infinity. This example led directly to the paper of Kiefer and Wolfowitz (1956) on the consistency and efficiency of the maximum likelihood ? Journal of the American Statistical Association December 1982, Volume 77, Number 380 Theory and Methods Section 831
832 Joumal of the American Statistical Association,December 1982 estimates with infinitely many nuisance parameters.An- 2.THE EXAMPLE other example,mentioned in Barlow et al.(1972),in- volves estimating a distribution known to be star-shaped The following densities on[-1,1]provide a continuous (i.e,F(λx)≤λF(x)for all0≤λ≤1 and all x such that parameterization between the triangular distribution (when F(x)<1).If the true distribution is uniform on (0,1),the 0 =0)and the uniform (when 0 =1)with parameter maximum likelihood estimate converges to F(x)=x2 on space [0,1]: (0,1). The central theorem on the global consistency of max- fx|0)=(1-0) imum likelihood estimates is due to Wald (1949).This theorem gives conditions under which the maximum like- ×1A闲+2-1., lihood estimates and approximate maximum likelihood estimates(values of 0 that yield a value of the likelihood where A represents the interval [0 -5(0),0 +5(0)], function that comes within a fixed fraction c,0<c<1, 8(0)is a continuous decreasing function of 0 with 8(0) of the maximum)are strongly consistent.Other formu-1 and 0<8(0)s1 -0 for 0<0<1,and Is(x) lations of Wald's Theorem and its variants may be found represents the indicator function of the set S.For 0= in LeCam(1953),Kiefer and Wolfowitz (1956),Bahadur 1,f(x0)is taken to be(x).It is assumed that (1967),and Perlman(1972).A particularly informative independent identically.distributed observations X1,X2, exposition of the problem may be found in Chapter 9 of ..are available from one of these distributions.Then Bahadur(1971). conditions 1 through 4 of the introduction are satisfied. The example contained in Section 2 has the following These conditions imply the existence of a maximum like- properties: lihood estimate for any sample size because a continuous function defined on a compact set achieves its maximum 1.The parameter space is a compact interval on the on that set. real line. 2.The observations are independent identically dis- Theorem.Let 0 denote a maximum likelihood estimate tributed according to a distribution F(x|0)for some 0 of 0 based on a sample of size n.If 8(0)-0 sufficiently ∈0. fast as 0-1 (how fast is noted in the proof),then 0 3.Densities f(x0)with respect to some o-finite 1 with probability I as n,whatever be the true value of0∈[0,ll. measure (Lebesgue measure in the example)exist and are continuous in 0 for all x. Proof.Continuity of f(x 0)in 0 and compactness of 4.(Identifiability)If 00',then F(x 0)is not iden- implies that the maximum likelihood estimate,0,some tical to F(x 0'). value of 0 that maximizes the log-likelihood function It is seen that whatever the true value,0o,of the pa- ln(0)=∑log f() =1 rameter,the maximum likelihood estimate,which exists because of 1,2,and 3,converges almost surely to a fixed exists.Since for 0<1 value (I in the example)independent of 0o. Example 2 of Bahadur(1958)(Example 9.2 of Bahadur f610)s1-9+0_1 8(0) 1971)also has the properties stated previously,and the 26+五, example of Section 2 may be regarded as a continuous we have that for each fixed positive number a<1, version of Bahadur's example.However,the distribu- tions in Bahadur's example seem rather artificial and the maxIn(0)≤6a 十是<∞ 0≤0≤a1 parameter space is countable with a single limit point The example presented here is more natural;the sample since 8(0)is decreasing.We complete the proof by show- space is [-1,+1],the parameter space is [0,1],and the ing that whatever be the true value of 0, distributions are familiar,each being a mixture of the 1 uniform distribution and a triangular one. m)→。with probablityne In Section 3,it is seen how to modify the example using beta distributions so that Cramer's conditions are satis- provided 8(0)->0 sufficiently fast as 0-1,since then fied.This gives an example in which asymptotically ef- 0,will eventually be greater than a for any preassigned ficient estimates exist and may be found by improving a<I.Let Mn=max{Xi,.·,Xn}.Then M→1with any convenient 0(Vn)-consistent estimate by scoring, probability one whatever be the true value of 0,and since and yet the maximum likelihood estimate exists and even- 0<M<1 with probability one, tually satisfies the likelihood equation but converges to a fixed point with probability 1 no matter what the true max1ln(0)≥ln(Mn) 0≤0s1n n value of the parameter happens to be.Such an example was announced by LeCam in the discussion of Berkson's (1980)papr. log 2 n n
832 Joumal of the American Statistical Association, December 1982 estimates with infinitely many nuisance parameters. Another example, mentioned in Barlow et al. (1972), involves estimating a distribution known to be star-shaped (i.e., F(Ax) s XF(x) for all 0 < A s 1 and all x such that F(x) < 1). If the true distribution is uniform on (0, 1), the maximum likelihood estimate converges to F(x) = X2 on (0, 1). The central theorem on the global consistency of maximum likelihood estimates is due to Wald (1949). This theorem gives conditions under which the maximum likelihood estimates and approximate maximum likelihood estimates (values of 0 that yield a value of the likelihood function that comes within a fixed fraction c, 0 < c < 1, of the maximum) are strongly consistent. Other formulations of Wald's Theorem and its variants may be found in LeCam (1953), Kiefer and Wolfowitz (1956), Bahadur (1967), and Perlman (1972). A particularly informative exposition of the problem may be found in Chapter 9 of Bahadur (1971). The example contained in Section 2 has the following properties: 1. The parameter space 0 is a compact interval on the real line. 2. The observations are independent identically distributed according to a distribution F(x I 0) for some 0 E0. 3. Densities f(x I 0) with respect to some ur-finite measure (Lebesgue measure in the example) exist and are continuous in 0 for all x. 4. (Identifiability) If 0 # 0', then F(x I 0) is not identical to F(x I 0'). It is seen that whatever the true value, 00, of the parameter, the maximum likelihood estimate, which exists because of 1, 2, and 3, converges almost surely to a fixed value (1 in the example) independent of 00. Example 2 of Bahadur (1958) (Example 9.2 of Bahadur 1971) also has the properties stated previously, and the example of Section 2 may be regarded as a continuous version of Bahadur's example. However, the distributions in Bahadur's example seem rather artificial and the parameter space is countable with a single limit point. The example presented here is more natural; the sample space is [- 1, + 1], the parameter space is [0, 1], and the distributions are familiar, each being a mixture of the uniform distribution and a triangular one. In Section 3, it is seen how to modify the example using beta distributions so that Cramer's conditions are satisfied. This gives an example in which asymptotically efficient estimates exist and may be found by improving any convenient O(\/)-consistent estimate by scoring, and yet the maximum likelihood estimate exists and eventually satisfies the likelihood equation but converges to a fixed point with probability 1 no matter what the true value of the parameter happens to be. Such an example was announced by LeCam in the discussion of Berkson's (1980) paper. 2. THE EXAMPLE The following densities on [ - 1, 1] provide a continuous parameterization between the triangular distribution (when 0 = 0) and the uniform (when 0 = 1) with parameter space 0 =[0, 1]: f(x I 0)=(1 - 0)5(0) I I - 0) X IA(X) + 2 where A represents the interval [0 - 8(0), 0 + 8(0)], 8(0) is a continuous decreasing function of 0 with 8(0) = 1 and 0 < 8(0) c 1 - 0 for 0 < 0 < 1, and Is(x) represents the indicator function of the set S. For 0 = 1, f(x I 0) is taken to be ' I[l l](x). It is assumed that independent identically distributed observations X1, X2, ... are available from one of these distributions. Then conditions 1 through 4 of the introduction are satisfied. These conditions imply the existence of a maximum likelihood estimate for any sample size because a continuous function defined on a compact set achieves its maximum on that set. Theorem. Let 0, denote a maximum likelihood estimate of 0 based on a sample of size n. If 8(0) -O 0 sufficiently fast as 0 --*1 (how fast is noted in the proof), then 0,n --*1 with probability 1 as n c m, whatever be the true value of 0 E [0, 1]. Proof. Continuity of f(x I 0) in 0 and compactness of 0 implies that the maximum likelihood estimate, 0,n, some value of 0 that maximizes the log-likelihood function n ln (0) = log f(xi 0) i= 1 exists. Since for 0 < 1 1 -0 0 1 f(x I ) '5(0) 2 5(0) 2 we have that for each fixed positive number o t 1, 1 1 max o -lIn(f0) <- + 'I 0 o--cot n b((o) since 8(0) is decreasing. We complete the proof by showing that whatever be the true value of 0, maxI l() ---> ?O with probability one o-o-i n provided 8(0) -> 0 sufficiently fast as 0 1-> , since then On will eventually be greater than a for any preassigned a < 1. Let Mn = max{X ,... , Xn}. Then Mn,-> 1 with probability one whatever be the true value of 0, and since O < Mn < 1 with probability one, max -I ln(0) - I ln(Mn) O-OI fl fn _n-i1 Mn, 1 1 -Mn, -log 2y+-nlog 5(M,,)
Ferguson:Inconsistent Maximum Likelihood Estimate 833 Therefore,with probability one 5.There is a function K(x)=0 with finite expectation, lim inf max,号l.(0)≥log2 1 1 EeoK(x)=K(x)f(x|0o)dx<o, n→0ss1n 1-Mn such that +lim inf log6Ma】 月001 log )for all x and all 0. f(x|0o) Whatever be the value of 0,M converges to 1 at a certain rate,the slowest rate being for the triangular (0 0) (To get global consistency this assumption must be made since this distribution has smaller mass than any of the for all 0oE,but K(x)may depend on 0o.)This condition others in sufficiently small neighborhoods of 1.Thus we is therefore not satisfied in the example.It would be can chooseδ(0)→0 so fast as0→1that(1/n)log(1 satisfied if the parameter space were limited to,say,[0, -M)/(M))>o with probability one for the triangular 1-e]since the density would then be bounded. and hence for all other possible true values of 0,com- pleting the proof. 3.A DIFFERENTIABLE MODIFICATION How fast is fast enough?Take 0 =0 and note that if 0<e<1, Without much difficulty,this example can be modified so that the densities satisfy Cramer's conditions for the ∑Po(n(1-Mn)>e)=∑Po(Mn<1-∈n-14) existence of an asymptotically efficient sequence of roots of the likelihood equation.This amounts to modifying the =∑Po(X<1-∈n-14)n distributions so that the resulting density,f(x 0),(a)has two continuous derivatives that may be passed beneath =∑(1-e2n-12)” the integral sign in ff(x 0)dx =1,(b)has finite and positive Fisher information at all points 0 interior to and (c)satisfies a2/a02 f(x 0)<K(x)in some neigh- ≤∑exp(-是e2Vn)<o borhood of the true 0o,where K(x)is 0o-integrable.The simplest modification is to use the family of beta densities so that by the Borel-Cantelli Lemma,n(1-M)0 on [0,1]as follows.Let g denote the density of the Be(a, with probability one.Therefore,the choice B)distribution, 8(0)=(1-0)exp(-(1-0)-4)+1 g收a=f%t0ra-g-ane, gives a 8(0)that is continuous,decreasing,with 5(0)= 1,0<6(0)<1-0for0<0<1,and and let f be the density of the mixture of a Be(1,1) ,1-Mn= (uniform)and a Be(a,β), δ(Mn)n(1-Mn)4-n f(x)=0g(x|1,1)+(1-0)g(x|a(0),B(0), with probability one. where a(0)and B(0)are chosen to be twice continuously Although the maximum likelihood method fails asymp- differentiable and to give the density a very sharp peak totically in this example,other methods of estimation can close to 0,say mean 0 and variance tending to 0 suffi- yield consistent estimates.Bayes methods,for example, ciently fast as 0-1.Thus we take =[1], would be strongly consistent for almost all e with respect to the prior distribution,as implied by a general argument c(0)=0δ(0),andB(6)=(1-0)8(0). of Doob (1948).Simpler computationally,but not gen- erally as accurate,are the estimates given by the method The particular form of 5(0)is not important.What is of moments or minimum x2 based on a finite number of important is that cells,and such methods can be made to yield consistent 1.8(0)is twice continuously differentiable, estimates.Estimates that are consistent may also be con- 2.(1 -0)8(0),and hence 5(0),is increasing on [1), structed by the minimum distance method of Wolfowitz 3.8()>2 (to obtain identifiability),and (1957). 4.δ(o)tends to o sufficiently fast as0→l. If one simple condition were added to conditions 1 through 4 of the introduction,the argument of Wald(1949)For 0 1,f(x|1)is defined to be g(x 1,1).Then f(x would imply the strong consistency of the maximum like-0)is continuous in 0 e [1]for each x,and for the true lihood estimates.This is a uniform boundedness condi- o∈(,l),Cramer's conditions are satisfied. tion that may be stated as follows:Let 0o denote the true The proof that every maximum likelihood sequence value of the parameter.Then the maximum likelihood converges to 1 with probability one as n-o no matter estimate 0r converges to 0o with probability one provided what the true value of 0 E[,1]is completely analogous conditions 1 through 4 hold and to the corresponding proof in Section 2,except that in
Ferguson: Inconsistent Maximum Likelihood Estimate 833 Therefore, with probability one lim inf max 1 1n(0))-logI n xo-oc 1 n 2 + lrn inf I logI M n --> n 8(Mn) Whatever be the value of 0, Mn converges to 1 at a certain rate, the slowest rate being for the triangular (0 = 0) since this distribution has smaller mass than any of the others in sufficiently small neighborhoods of 1. Thus we can choose 8(0) -> 0 so fast as 0 1-> that (1/n) log((1 - Mn)l(Mn)) -X oo with probability one for the triangular and hence for all other possible true values of 0, completing the proof. How fast is fast enough? Take 0 = 0 and note that if 0< E < 1, Y, PO(N_( 1- Mn ) > e) = Po(Mn < 1 - n n n = >Po(X< 1 en - l/4)n n = E(-2 ,2n-1/2) n n nEexp(- 12 E2 N/) < ?? n so that by the Borel-Cantelli Lemma, "G_(1 - MO) -> 0 with probability one. Therefore, the choice 8(0) = (1 - 0)exp(-(1 - O)-4) + 1 gives a 8(0) that is continuous, decreasing, with 8(0) = 1, 0 < 8(0) < 1 - 0 for 0 < 0 < 1, and 1 1 -Mn 1 1 - log - - I o n 8(Mn) n(I - Mn)4 n with probability one. Although the maximum likelihood method fails asymptotically in this example, other methods of estimation can yield consistent estimates. Bayes methods, for example, would be strongly consistent for almost all 0 with respect to the prior distribution, as implied by a general argument of Doob (1948). Simpler computationally, but not generally as accurate, are the estimates given by the method of moments or minimum x2 based on a finite number of cells, and such methods can be made to yield consistent estimates. Estimates that are consistent may also be constructed by the minimum distance method of Wolfowitz (1957). If one simple condition were added to conditions 1 through 4 of the introduction, the argument of Wald (1949) would imply the strong consistency of the maximum likelihood estimates. This is a uniform boundedness condition that may be stated as follows: Let 0o denote the true value of the parameter. Then the maximum likelihood estimate f0,, converges to 0o with probability one provided conditions 1 through 4 hold and 5. There is a function K(x) ? 0 with finite expectation, Eo0K(x) = f K(x)f(x I Oo) dx < 00, such that log f(x I 0) < K(x) for all x and all 0. (To get global consistency this assumption must be made for all 0o E 0, but K(x) may depend on 00.) This condition is therefore not satisfied in the example. It would be satisfied if the parameter space were limited to, say, [0, 1 - E] since the density would then be bounded. 3. A DIFFERENTIABLE MODIFICATION Without much difficulty, this example can be modified so that the densities satisfy Cramer's conditions for the existence of an asymptotically efficient sequence of roots of the likelihood equation. This amounts to modifying the distributions so that the resulting density, f(x I 0), (a) has two continuous derivatives that may be passed beneath the integral sign in f f(x I 0)dx 1, (b) has finite and positive Fisher information at all points 0 interior to 0, and (c) satisfies I a2/ao2 f(x I 0) I < K(x) in some neighborhood of the true 00, where K(x) is Oo-integrable. The simplest modification is to use the family of beta densities on [0, 1] as follows. Let g denote the density of the Be(oa, ,B) distribution, 9(xI t,B) r (t + ?) x -I(l - x)-l I[O,1](X), F(oL-)F(13) and let f be the density of the mixture of a Be(l, 1) (uniform) and a Be(ot, 13), f(x I 0) = Og(x I 1, 1) + (1 - 0)g(x I a(0), ,B(0)), where a(0) and 13(0) are chosen to be twice continuously differentiable and to give the density a very sharp peak close to 0, say mean 0 and variance tending to 0 sufficiently fast as 0 -> 1. Thus we take 0 = [2, 1], a(0) = 08(0), and 13(0) = (1 - 0)8(0). The particular form of 8(0) is not important. What is important is that 1. 8(0) is twice continuously differentiable, 2. (1 - 0)8(0), and hence 8(0), is increasing on [1, 1), 3. 8(2) > 2 (to obtain identifiability), and 4. 5(0) tends to oo sufficiently fast as 0 -> 1. For0 = 1, f(x I 1) is defined to be g(x I 1, 1). Then f(x I 0) is continuous in 0 E [L, 1] for each x, and for the true 00 E (2, 1), Cramer's conditions are satisfied. The proof that every maximum likelihood sequence converges to 1 with probability one as n -X0 no matter what the true value of 0 E [1, 1] is completely analogous to the corresponding proof in Section 2, except that in
834 Joumal of the American Statistical Association,December 1982 the inequalities,Stirling's formula in the form DOOB,J.(1948),"Application of the Theory of Martingales,"Le Cal- V2raa-(1/2)e-a≤T(a) cul des Probabilites et ses Applications.Colloques Internationaux du Centre National de la Researche Scientifique,Paris,23-28. EDWARDS,A.W.F.(1972),Likelihood,Cambridge:Cambridge Uni- V2T a-(12)exp(-a (1/12a)) versitv Press FELLER,W.(1950),An Introduction to Probability Theory and its as in Feller (1950,p.44)is useful.In this example,the Applications,(Vol.1,Ist Ed.),New York:John Wiley. slowest rate of convergence of maxi=nX:to 1 occurs for KIEFER,J.,and WOLFOWITZ,J.(1956),"Consistency of the Max- 0=.By the method of Section 2,it may be calculated imum Likelihood Estimator in the Presence of Infinitely Many In- cidental Parameters,"'Annals of Mathematical Statistics,27,887-906 that the function KRAFT,C.H.,and LECAM,L.M.(1956),"A Remark on the Roots of the Maximum Likelihood Equation,"'Annals of Mathematical 8(0)=(1-0)-1exp(1-0)-2) S1 atistics,27,1174-1177. LECAM,L.M.(1953),"On Some Asymptotic Properties of Maximum converges to o sufficiently fast and satisfies conditions Likelihood Estimates and Related Bayes Estimates,University of 1 to 4 of this section. California Publications in Statistics,1,277-328. LEHMANN,E.L.(1980),"Efficient Likelihood Estimators,'"The [Received October 1980.Revised April 1982. American Statistican,34,233-235. NEYMAN,J.,and SCOTT,E.(1948),"Consistent Estimators Based REFERENCES on Partially Consistent Observations,"Econometrica,16,1-32. NORTON,R.H.(1972),"A Survey of Maximum Likelihood Estimat- BAHADUR,R.R.(1958),"Examples of Inconsistency of Maximum tion,"Review of the International Statistical Institute,40,329-354, Likelihood Estimates,"'Sankhya,20,207-210. and part IⅡ(1973),41,39-58. 一(I96),“Rates of Convergence of Estimates and Test Statistics,,” PERLMAN,M.D.(1972),"On the Strong Consistency of Approximate Annals of Mathematical Statistics,38,303-324. Maximum Likelihood Estimates,''Proceedings of the Sixth Berkeley -(1971),Some Limit Theorems in Statistics,Regional Conference Symposium on Mathematical Statistics and Probability,1,263-281. Series in Applied Mathematics,4,Philadelphia:SIAM. QUANDT,R.E.,and RAMSEY,J.L.(1978),"Estimating Mixtures of BARLOW.R.E..BARTHOLOMEW,D.J.,BREMNER,J.M.,and Normal Distributions and Switching Regressions,"Journal of the BRUNK,H.D.(1972),Statistical Inference Under Order Restric- American Statistical Association,73,730-738. tions.New York:John Wiley. RAO,C.R.(1965),Linear Statistical Inference and Its Applications, BASU,D.(1955),"An Inconsistency of the Method of Maximum Like- New York:John Wiley. lihood,''Annals of Mathematical Statistics,26,144-145. WALD,A.(1949),"Note on the Consistency of the Maximum Like- BERKSON,J.(1980),"Minimum Chi-Square,not Maximum Likeli- lihood Estimate,"Annals of Mathematical Statistics,20,595-601. hood!"'Annals of Statistics,8,457-487. WOLFOWITZ,J.(1957),"The Minimum Distance Method,"Annals CRAMER,H.(1946),Mathematical Methods of Statistics,Princeton: of Mathematical Statistics,28,75-88. Princeton University Press
834 Journal of the American Statistical Association, December 1982 the inequalities, Stirling's formula in the form 2 a-(1/2) e -?1 7 (a) _:< Vg (x a - (1/2) exp(-o. + (1/12a)) as in Feller (1950, p. 44) is useful. In this example, the slowest rate of convergence of maxi?Xi to 1 occurs for 0 = -. By the method of Section 2, it may be calculated that the function 6(0) = (1 - 0)1 exp((1 - 0)-2) converges to X sufficiently fast and satisfies conditions 1 to 4 of this section. [Received October 1980. Revised April 1982.] REFERENCES BAHADUR, R.R. (1958), "Examples of Inconsistency of Maximum Likelihood Estimates," Sankhya, 20, 207-210. (1967), "Rates of Convergence of Estimates and Test Statistics," Annals of Mathematical Statistics, 38, 303-324. (1971), Some Limit Theorems in Statistics, Regional Conference Series in Applied Mathematics, 4, Philadelphia: SIAM. BARLOW, R.E., BARTHOLOMEW, D.J., BREMNER, J.M., and BRUNK, H.D. (1972), Statistical Inference Under Order Restrictions, New York: John Wiley. BASU, D. (1955), "An Inconsistency of the Method of Maximum Likelihood," Annals of Mathematical Statistics, 26, 144-145. BERKSON, J. (1980), "Minimum Chi-Square, not Maximum Likelihood!" Annals of Statistics, 8, 457-487. CRAMER, H. (1946), Mathematical Methods of Statistics, Princeton: Princeton University Press. DOOB, J. (1948), "Application of the Theory of Martingales," Le Calcul des Probabilites et ses Applications. Colloques Internationaux du Centre National de la Researche Scientifique, Paris, 23-28. EDWARDS, A.W.F. (1972), Likelihood, Cambridge: Cambridge University Press. FELLER, W. (1950), An Introduction to Probability Theory and its Applications, (Vol. 1, 1st Ed.), New York: John Wiley. KIEFER, J., and WOLFOWITZ, J. (1956), "Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters," Annals of Mathematical Statistics, 27, 887-906. KRAFT, C.H., and LECAM, L.M. (1956), "A Remark on the Roots of the Maximum Likelihood Equation," Annals of Mathematical Statistics, 27, 1174-1177. LECAM, L.M. (1953), "On Some Asymptotic Properties of Maximum Likelihood Estimates and Related Bayes Estimates," University of California Publications in Statistics, 1, 277-328. LEHMANN, E.L. (1980), "Efficient Likelihood Estimators," The American Statistican, 34, 233-235. NEYMAN, J., and SCOTT, E. (1948), "Consistent Estimators Based on Partially Consistent Observations," Econometrica, 16, 1-32. NORTON, R.H. (1972), "A Survey of Maximum Likelihood Estimattion," Review of the International Statistical Institute, 40, 329-354, and part 11 (1973), 41, 39-58. PERLMAN, M.D. (1972), "On the Strong Consistency of Approximate Maximum Likelihood Estimates," Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 1, 263-281. QUANDT, R.E., and RAMSEY, J.L. (1978), "Estimating Mixtures of Normal Distributions and Switching Regressions," Journal of the American Statistical Association, 73, 730-738. RAO, C.R. (1965), Linear Statistical Inference and Its Applications, New York: John Wiley. WALD, A. (1949), "Note on the Consistency of the Maximum Likelihood Estimate," Annals of Mathematical Statistics, 20, 595-601. WOLFOWITZ, J. (1957), "The Minimum Distance Method," Annals of Mathematical Statistics, 28, 75-88