158 L.LE CAM If x1,x2,...,are n observations taken on [0,1]the corresponding logarithm of likelihood ratio is given by the expression: A-si0图-8os-坐oeg dpi(xi) where the first sum (is for(ak,a-]and the second is for xe(a,-l. Now assume that the X1,...,X are actually i.i.d.from some distribution pio They have a minimum Zn min Xi. With probability unity this will fall in some interval (ak,]with k=k(Z).Fix a value j and consider nAf".This is at least equal to 1,h(Za)1.h(a) log- -Vi.n log- c n where vin is the number of X,'s which fall in (a,a-1]. According to the strong law of large numbers nv converges to some constant P1.Also,jo being fixed,Zn tends almost surely to zero.In fact if y<a one can write Pio[Zn>y)=(1-cy)"se-ncy. Thus,as long as Ee-me<it will be almost certain that eventually Z<y.In particular Z may have a limiting distribution but nZ almost certainly tends to zero. This being the case take c=and h=exp{1/x2).Then 1 log h(Z)=(nZ)-1 tends to infinity almost surely. Thus if we take any finite set J=(1,2,...,j),for any fixed jo there will almost surely be an integer N such that On cease to be in J from N on. It might be thought that such disgraceful behavior is due to the vagaries of measure theory.Indeed the variables X;used here are continuous variables and everybody knows that such things do not really exist. However,replace the measures P used above by measures g&whose density on (ak,a1)is constant and equal to as-1-小'hc Then,there is no need to record the exact values of the observations X.It is quite enough to record in which interval(ak,ak1]they fall.The parameter 6 is itself integer valued.However the same misbehavior of m.l.e.will still occur.(This is essentially equivalent to Bahadur's first construction.) In the present construction the parameter e is integer valued.It is easy to modify the example to obtain one in which e takes values,say,in (1,and in which the observable variables have densities f(x,e)which are infinitely differentiable functions of 0.For this purpose define p&as above.Let u be a function defined on (-+constructed so that u(x)=0 forxs0,and u(x)=1 for x>1.One can find functions u of that kind which are strictly increasing on (0,1)and are infinitely differentiable on (-+) Now let pe=p&if 6 is equal to the integer k.If 6e(k,k+1)let Pa=[1-u(0-k)]pk+u(8-k)pk+1:
L. LE CAM If Xl, X2,..., Xn are n observations taken on [0, 1] the corresponding logarithm of likelihood ratio is given by the expression: log fn dpk(x,) (k) (x) h(xi) i=1 dpj(xi) i c c where the first sum (k) is for xi E (ak, ak-lI and the second is for xi E (aj, aj-1]. Now assume that the X1,..., Xn are actually i.i.d. from some distribution pjo. They have a minimum Zn = min Xi. i With probability unity this will fall in some interval (akn, akn-l] with kn = kn(Zn). Fix a value j and consider n-lAk". This is at least equal to 1 h(Zn) 1 h(ai) n n-log c c -nv,log n c where Vj,n is the number of Xj's which fall in (aj, aj_1]. According to the strong law of large numbers n-l'vjn converges to some constant Pjo,s S 1. Also, jo being fixed, Zn tends almost surely to zero. In fact if y < ajo one can write pji{Zn >y} = (1 - cy)n < e-ncy Thus, as long as n e-nc"" <00 it will be almost certain that eventually Z, <y,. In particular Zn may have a limiting distribution but nZ2 almost certainly tends to zero. This being the case take c = 9 and h = exp{l/x2}. Then -log h(Zn) = (nZ)- tends to infinity almost surely. Thus if we take any finite set J = (1, 2, ..., j;), for any fixed jo there will almost surely be an integer N such that 9N cease to be in J from N on. It might be thought that such disgraceful behavior is due to the vagaries of measure theory. Indeed the variables Xj used here are continuous variables and everybody knows that such things do not really exist. However, replace the measures Pk used above by measures qk whose density on (ak, ak-1) is constant and equal to ak-1 [ak-l- ak]- j h(x) dx. ak Then, there is no need to record the exact values of the observations Xj. It is quite enough to record in which interval (ak, ak_l] they fall. The parameter 0 is itself integer valued. However the same misbehavior of m.l.e. will still occur. (This is essentially equivalent to Bahadur's first construction.) In the present construction the parameter 0 is integer valued. It is easy to modify the example to obtain one in which 0 takes values, say, in (1, oo) and in which the observable variables have densities f(x, 0) which are infinitely differentiable functions of 0. For this purpose define Pk as above. Let u be a function defined on (-oo, +oo) constructed so that u(x) = 0 for x < 0, and u(x) = 1 for x > 1. One can find functions u of that kind which are strictly increasing on (0, 1) and are infinitely differentiable on (-oo, +oo). Now let pe =Pk if 0 is equal to the integer k. If 0 e (k, k + 1) let pe = [1 - u(0 - k)pk + u(0 - k)pk+. 158
Maximum Likelihood 159 Taking for each p&the densities f used previously,we obtain similarly densities f(x,)=[1-u(6-k)]f(x)+u(0-k)f+1(x): The function u can be constructed,for instance,by taking a multiple of the indefinite integral of the function ep{[+] for te[0,1)and zero otherwise.If so f(x,6)is certainly infinitely differentiable in 0. Also the integral ff(x,)dx can be differentiated infinitely under the integral sign.There is a slight annoyance that at all integer values of e all the derivatives vanish.To cure this take a=10-101"and let 8(x,)=[f(x,)+f(x,日+e-1 Then,certainly,everything is under control and the famous conditions in Cramer's text are all duly satisfied.Furthermore,0'implies Jl8x,6)-8x,0"1dc>0. In spite of all this,whatever may be the true value 6o,the maximum likelihood estimate still tends almost surely to infinity. Let us return to the initial example with measures p&,k=1,2,...,and let us waste some information.Having observed X1,...,X,according to one of the p&take independent identically distributed N(0,10)variables Yi,...,Y and consider V= Xi+Yi for j=1,2,...,n. Certainly one who observes V,j=1,...,n,instead of Xi,i=1,...,n,must be at a gross disadvantage! Maximum likelihood estimates do not really think so. The densities of the new variables V are functions,say va,defined,positive analytic, etc.on the whole line R=(-,+)They still are all different.In other words Ipx(x)-9,(x川d>0(k≠): Compute the maximum likelihood estimate=(v1,...,v)for these new observa- tions.We claim that pln(y,…,V)=j-→1 asn→oo. To prove this let o=103 and note that (v)is a moderately small distortion of the function 1e-(v-/o)d+(1-c) )=c。N2而 1 e-(v-aj)21(202) V(2π) Furthermore,as m the function v(v)converges pointwise to f_1eo-smd5+(1-c)。 .)=c。N2m 1e22. V(2π) Thus,we can compactify the set =(1,2,...}by addition of a point at infinity with (v)as described above
Maximum Likelihood Taking for each Pk the densities fk used previously, we obtain similarly densities f(x, 0) = [1 - u( - k)lfk(x) + u(O - k)fk+l(X). The function u can be constructed, for instance, by taking a multiple of the indefinite integral of the function {-[ 1 for t E [0, 1) and zero otherwise. If so f(x, 0) is certainly infinitely differentiable in 0. Also the integral ff(x, 0) dx can be differentiated infinitely under the integral sign. There is a slight annoyance that at all integer values of 0 all the derivatives vanish. To cure this take a = 1010137 and let g(x, 0) = l[f(x, 0) +f(x, 0 + ce-4)]. Then, certainly, everything is under control and the famous conditions in Cramer's text are all duly satisfied. Furthermore, 0 6O' implies Ig(x, 0)-g(x, 0') dx >0. In spite of all this, whatever may be the true value O0, the maximum likelihood estimate still tends almost surely to infinity. Let us return to the initial example with measures Pk, k = 1, 2,..., and let us waste some information. Having observed X1,... , Xn, according to one of the Pk take independent identically distributed N(0, 106) variables Yl,..., Yn and consider Vj= Xj + Yj for j= 1, 2, ..., n. Certainly one who observes Vj, j = 1,..., n, instead of Xj, i = 1,... , n, must be at a gross disadvantage! Maximum likelihood estimates do not really think so. The densities of the new variables Vj are functions, say IPk, defined, positive analytic, etc. on the whole line R = (-oo, +oo). They still are all different. In other words I Ik(x)- j(x)l dx >0 (k j). Compute the maximum likelihood estimate On = n(v1,..., Vn) for these new observations. We claim that pj[O (V1 ..., Vn) =j]- 1 as n - oo. To prove this let a = 103 and note that ipj(v) is a moderately small distortion of the function i(v) = c a e(2v) 2f(2a2) d_ + (1 - C) ( (ve-i)2/(2a2) or +V(2r) oV/(2sr) Furthermore, as m -- oo the function Pm(v) converges pointwise to .1 1 1 -(v) = c e a((2-) e 2( d +1 - (2) ev2/(222) orN(2.1r)'a~ + ( c)o/(2r) Thus, we can compactify the set = {1, 2, .. .} by addition of a point at infinity with t~oo(v) as described above. 159