Example 1.Somites of earthworms. 1.Introduction Earthworms have segmented bodies.The segments are known as 1.1 Why empirical likelihood somites.As a worm grows,both the number and the length of its somites increases. .nonparametric method:without having to assume the form of the underlying distribution The dataset contains the No.of somites on each of 487 worms gathered near Ann Arbor in 1902. likelihood based inference:taking the advantages of likelihood The histogram shows that the distribution is skewed to the left, methods and has a heavier tail to the left. E(X-EX)31 alternative method when other (more conventional)methods Skewness:=Var(X) a measure for symmetry are not applicable Kurtosis:=-3.a measure for tail-heaviness Estimation for y and K Let=n-1isisn Xi,and and 2=(n-1)-1 Eisisn(Xi-)2. Remark.(i)For N(u,o2),y=0 and K=0. 1" (K-)3, R=- (ii)For symmetric distributions,y=0. How to find the confidence sets for (y.) (iii)When >0,heavier tails than those of N(H,2). Answer:Empirical likelihood contours. Let l(y,K)be the (log-)empirical likelihood function of (y,K).The confidence region for (y,)is defined as {(Y,K):l(Y,K)>C, where C>0 is a constant determined by the confidence level,i.e. P{l(Y,K)>C}=1-a
1. Introduction 1.1 Why empirical likelihood • nonparametric method: without having to assume the form of the underlying distribution • likelihood based inference: taking the advantages of likelihood methods • alternative method when other (more conventional) methods are not applicable Example 1. Somites of earthworms. Earthworms have segmented bodies. The segments are known as somites. As a worm grows, both the number and the length of its somites increases. The dataset contains the No. of somites on each of 487 worms gathered near Ann Arbor in 1902. The histogram shows that the distribution is skewed to the left, and has a heavier tail to the left. Skewness: γ = E{(X−EX) 3} {Var(X)} 3/2 , — a measure for symmetry Kurtosis: κ = E{(X−EX) 4} {Var(X)} 2 −3, — a measure for tail-heaviness Remark. (i) For N(µ, σ2), γ = 0 and κ = 0. (ii) For symmetric distributions, γ = 0. (iii) When κ > 0, heavier tails than those of N(µ, σ2). Estimation for γ and κ Let X¯ = n −1 P 1≤i≤n Xi , and and σb 2 = (n − 1)−1 P 1≤i≤n(Xi − X¯) 2. γb = 1 nσb3 Xn i=1 (Xi − X¯) 3 , κb = 1 nσb4 Xn i=1 (Xi − X¯) 4 . How to find the confidence sets for (γ, κ)? Answer: Empirical likelihood contours. Let l(γ, κ) be the (log-) empirical likelihood function of (γ, κ). The confidence region for (γ, κ) is defined as {(γ, κ) : l(γ, κ) > C}, where C > 0 is a constant determined by the confidence level, i.e. P{l(γ, κ) > C} = 1 − α
Raw data E.L.Confidence Regions Why do conventional methods not apply? Parametric likelihood.Not normal distribution!Likelihood infer- ence for high moments is typically not robust wrt a misspeci- 导 fied distribution. 8 Bootstrap.Difficult in picking out the confidence region from a point cloud consisting of a large number of bootstrap esti- E0100120140160 3.02.5.201.5 mates for (,) Number of Somites Somte skewness For example,given 1000 bootstrap estimates for (y,K),ideally In the second panel,the empirical likelihood confidence regions 95%confidence region should contain 950 central points. (i.e.contours)correspond to confidence levels of 50%,90%, In practice,we restrict to rectangle or ellipse regions in order 95%,99%,99.9%and99.99% to facilitate the estimation. Note.(y.K)=(0,0)is not contained in the confidence regions 1.2 Introducing empirical likelihood Since Let X =(X1,...,Xn)be a random sample from an unknown P{X1=r1,…,Xn=n}=p1…p distribution F().We know nothing about F(). the likelihood is In practice we observeXi=i (i 1,...,n),z1,...;In are n L(p1,,Pm)三L(p1,…,pmX)=Π, known numbers. 1=1 which is called an empirical likelihood. Basic idea.Assume F is a discrete distribution on1,..,n with Remark.The number of parameters is the same as the number Pi=F(ri), i=1,…,n of observations. where Note P≥0, 含州=1 =1 n=1 What is the likelihood function of(pi)?What is the MLE? the equality holds iff p1=...=Pn 1/n
In the second panel, the empirical likelihood confidence regions (i.e. contours) correspond to confidence levels of 50%, 90%, 95%, 99%, 99.9% and 99.99%. Note. (γ, κ) = (0, 0) is not contained in the confidence regions Why do conventional methods not apply? Parametric likelihood. Not normal distribution! Likelihood inference for high moments is typically not robust wrt a misspeci- fied distribution. Bootstrap. Difficult in picking out the confidence region from a point cloud consisting of a large number of bootstrap estimates for (γ, κ). For example, given 1000 bootstrap estimates for (γ, κ), ideally 95% confidence region should contain 950 central points. In practice, we restrict to rectangle or ellipse regions in order to facilitate the estimation. 1.2 Introducing empirical likelihood Let X = (X1, · · · , Xn) τ be a random sample from an unknown distribution F(·). We know nothing about F(·). In practice we observe Xi = xi (i = 1, · · · , n), x1, · · · , xn are n known numbers. Basic idea. Assume F is a discrete distribution on {x1, · · · , xn} with pi = F(xi), i = 1, · · · , n, where pi ≥ 0, Xn i=1 pi = 1. What is the likelihood function of {pi}? What is the MLE? Since P{X1 = x1, · · · , Xn = xn} = p1 · · · pn, the likelihood is L(p1, · · · , pn) ≡ L(p1, · · · , pn; X) = Yn i=1 pi , which is called an empirical likelihood. Remark. The number of parameters is the same as the number of observations. Note Yn i=1 pi 1/n ≤ 1 n Xn i=1 pi = 1 n , the equality holds iff p1 = · · · = pn = 1/n
Puti=1/n,we have Example 2.Find the MELE for EX1. L(p1,…,PmX)≤L(p1,…,n:X) Corresponding to the EL, for any P≥0and∑ih=1. μ=∑Pmi=H(p1,…,Pm). i=1 Hence the MLE based on the empirical likelihood,which is called maximum empirical likelihood estimator (MELE),puts the Therefore,the MELE for u is equal probability mass 1/n on the n observed values x1,...,In. =4m,…,m)=24 1分X=元 n=1 ni=l Namely the MELE for F is the uniform distribution on observed data points.The corresponding distribution function Similarly,the MELE fork=E()is the simply the sample k-th 1 n Fn(a)=二∑(X:≤r) moment: ni=1 =∑X is called the empirical distribution of the sample X =(X1,...,Xn)". n1 2.Empirical likelihood for means Let X1,...,Xn be a random sample from an unknown distribution. Remarks.(i)MELEs,without further constraints,are simply the method of moments estimators,which is not new. Goal:test hypotheses on =EX1.or find confidence intervals forμ. (ii)Empirical likelihood is a powerful tool in dealing with testing Tool:empirical likelihood ratios (ELR) hypotheses and interval estimation in a nonparametric manner based on the likelihood tradition,which also involves evaluating 2.1 Tests Consider the hypotheses MELEs under some further constraints. Ho:μ=0vsH1:μ≠o Let L(p1,...,Pn)=IIiPi.We reject Ho for large values of the ELR T=maxL1,…,pm)=Ln-1…,n-1 maxHo L(p1…,pm)L(1,·,pm) where (1}are the constrained MELEs for (pi}under Ho
Put pbi = 1/n, we have L(p1, · · · , pn; X) ≤ L(pb1, · · · , pbn; X) for any pi ≥ 0 and P i pi = 1. Hence the MLE based on the empirical likelihood, which is called maximum empirical likelihood estimator (MELE), puts the equal probability mass 1/n on the n observed values x1, · · · , xn. Namely the MELE for F is the uniform distribution on observed data points. The corresponding distribution function Fn(x) = 1 n Xn i=1 I(Xi ≤ x) is called the empirical distribution of the sample X = (X1, · · · , Xn) τ . Example 2. Find the MELE for µ ≡ EX1. Corresponding to the EL, µ = Xn i=1 pixi = µ(p1, · · · , pn). Therefore, the MELE for µ is µb = µ(pb1, · · · , pbn) = 1 n Xn i=1 xi = 1 n Xn i=1 Xi = X. ¯ Similarly, the MELE for µk ≡ E(Xk 1 ) is the simply the sample k-th moment: µbk = 1 n Xn i=1 Xk i . Remarks. (i) MELEs, without further constraints, are simply the method of moments estimators, which is not new. (ii) Empirical likelihood is a powerful tool in dealing with testing hypotheses and interval estimation in a nonparametric manner based on the likelihood tradition, which also involves evaluating MELEs under some further constraints. 2. Empirical likelihood for means Let X1, · · · , Xn be a random sample from an unknown distribution. Goal: test hypotheses on µ ≡ EX1, or find confidence intervals for µ. Tool: empirical likelihood ratios (ELR) 2.1 Tests Consider the hypotheses H0 : µ = µ0 vs H1 : µ 6= µ0. Let L(p1, · · · , pn) = Q i pi . We reject H0 for large values of the ELR T = max L(p1, · · · , pn) maxH0 L(p1, · · · , pn) = L(n −1, · · · , n−1) L(pe1, · · · , pen) , where {pe1} are the constrained MELEs for {pi} under H0
Two problems: (0)贴=7 Theorem1.Forμ∈(r(u):r(m), (ii)What is the distribution of T under Ho? 1 p(P)=n-- 、>0,1≤i≤n (1) (i)The constrained MELEspi=pi(Ho),where (pi()}are the where A is the unique solution of the equation solution of the maximisation problem: Ii- =0 max∑logm 台n-(-) (2) {}1 subject to the conditions in the interval (n P20, ∑=1, ∑=. Proof.We use the Lagrange multiplier technique to solve this =1 optimisation problem.Put The solution for the above problem is given in the theorem below.Note Q=∑IogP:+(∑P-1)十A(∑:-4): r)三min≤≤max:三工o Letting the partial derivatives of Q w.r.t.P.and X equal 0,we It is natural we require)≤r≤r(m have Now let g(A)be the function on the LHS of (2).Then m1++=0 (3) (红1-4)2 P=1 (4) )=En-A-p>0 ∑P江=H (5) Hence g(A)is a strictly increasing function.Note By(3). 四9)=0,= 灯m年 P=-1/(的+λx) (6) Hence g(A)=0 has a unique solution between in the interval Hence,1+p5+入rpi=O,which impliesψ=-(n+入). (nn together with (6)imply (1).By (1)and(5). 1)-μ'm-u 2-名-两= 工i (7) Note for any A in this interval, 1 1 It follows (4)that =空m=a--可 n-A()- >0。 n-((n)-) >0, and 1/{n-(-))is a monotonic function of It holds that This together with (7)imply(2). (r)>0 for all1≤i≤n
Two problems: (i) pei =? (ii) What is the distribution of T under H0? (i) The constrained MELEs pei = pi(µ0), where {pi(µ)} are the solution of the maximisation problem: max {pi} Xn i=1 log pi subject to the conditions pi ≥ 0, Xn i=1 pi = 1, Xn i=1 pixi = µ. The solution for the above problem is given in the theorem below. Note x(1) ≡ min i xi ≤ Xn i=1 pixi ≤ max i xi ≡ x(n) . It is natural we require x(1) ≤ µ ≤ x(n) . Theorem 1. For µ ∈ (x(1), x(n) ), pi(µ) = 1 n − λ(xi − µ) > 0, 1 ≤ i ≤ n, (1) where λ is the unique solution of the equation Xn j=1 xj − µ n − λ(xj − µ) = 0 (2) in the interval ( n x(1)−µ , n x(n)−µ ). Proof. We use the Lagrange multiplier technique to solve this optimisation problem. Put Q = X i log pi + ψ X i pi − 1) + λ( X i pixi − µ). Letting the partial derivatives of Q w.r.t. pi , ψ and λ equal 0, we have p −1 X i + ψ + λxi = 0 (3) i pi = 1 (4) X i pixi = µ (5) By (3), pi = −1/(ψ + λxi). (6) Hence , 1 + ψpi + λxipi = 0, which implies ψ = −(n + λµ). This together with (6) imply (1). By (1) and (5), X i xi n − λ(xi − µ) = µ. (7) It follows (4) that µ = µ X i pi = X i µ n − λ(xi − µ) . This together with (7) imply (2). Now let g(λ) be the function on the LHS of (2). Then g˙(λ) = X i (xi − µ) 2 {n − λ(xi − µ)} 2 > 0. Hence g(λ) is a strictly increasing function. Note lim λ↑ n x(n) −µ g(λ) = ∞, lim λ↓ n x(1)−µ g(λ) = −∞, Hence g(λ) = 0 has a unique solution between in the interval n x(1) − µ , n x(n) − µ . Note for any λ in this interval, 1 n − λ(x(1) − µ) > 0, 1 n − λ(x(n) − µ) > 0, and 1/{n − λ(x − µ)} is a monotonic function of x. It holds that pi(µ) > 0 for all 1 ≤ i ≤ n
Remarks.(a)When===0,and p:(m)=1/mi=1,…,n. (c)The likelihood function L()may be calculated using R-code It may be shown for u close to E(X;).and n large and Splus-code,downloaded at n0片1+新e-可 1 http://www-stat.stanford.edu/~owen/empirical/ where S(w)=∑:(ri-r)2. (ii)The asymptotic theorem for the classic likelihood ratio tests (i.e.Wilks'Theorem)still holds for the ELR tests. (b)We may view L()=L{p1(r),…,p()} as a profile empirical likelihood for u. Let X1,.…,Xni.i.d.,andμ=E(Xi).To test Hypothetically consider an 1-1 parameter transformation from H0:μ=0vsH1:μ≠0: {p1,…,pm}to{u,01,…,0m-1}.Then L()=maxL(4,01,…,0n-1)=L{4,01(μ),…,0n-1(4)} {} the ELR statistic is which is the solution of f(An)=0,where T max L(p1,....pn)(1/n)" maxH L(p1,…,P)=io (n)= ,X-0 n台11-n(Xy-o) inp(o)s】 红-acx-o以 =1 By a simple Taylor expansion 0=f(An)f(0)+f(0)An, where A is the unique solution of X-40 g-o/io)=-(-o/月x-o2 含-o =0 Now Theorem 2.Let E(X)<oo.Then under Ho. 2gTs2罗--o)-当0x-o月 21ogT=21og{1-2x-o)}一x好 =-2nn(R-4o)-2∑(X-Ho2≈n-2x-o2 n(灭-4o)2 =1 in distribution as n-oo. By the LLN,n-1∑(Xi-μo)2→Var(Xi).By the CLT,vm(r- A sketch proof.Under Ho.EX;=Ho.Therefore Ho is close to Ho)-N(0,Var(X1))in distribution.Hence 2 log Tx in distri- Xfor large n.Hence the )or more precisely.n =A/n is small, bution
Remarks. (a) When µ = ¯x = X¯, λ = 0, and pi(µ) = 1/n, i = 1, · · · , n. It may be shown for µ close to E(Xi), and n large pi(µ) ≈ 1 n · 1 1 + x¯−µ S(µ) (xi − µ) , where S(µ) = 1 n P i(xi − µ) 2. (b) We may view L(µ) = L{p1(µ), · · · , pn(µ)}. as a profile empirical likelihood for µ. Hypothetically consider an 1-1 parameter transformation from {p1, · · · , pn} to {µ, θ1, · · · , θn−1}. Then L(µ) = max {θi} L(µ, θ1, · · · , θn−1) = L{µ, bθ1(µ), · · · , bθn−1(µ)} (c) The likelihood function L(µ) may be calculated using R-code and Splus-code, downloaded at http://www-stat.stanford.edu/∼owen/empirical/ (ii) The asymptotic theorem for the classic likelihood ratio tests (i.e. Wilks’ Theorem) still holds for the ELR tests. Let X1, · · · , Xn i.i.d., and µ = E(X1). To test H0 : µ = µ0 vs H1 : µ 6= µ0, the ELR statistic is T = max L(p1, · · · , pn) maxH0 L(p1, · · · , pn) = (1/n) n L(µ0) = Yn i=1 1 npi(µ0) = Yn i=1 n 1 − λ n (Xi − µ0) o . where λ is the unique solution of Xn j=1 Xj − µ0 n − λ(Xj − µ0) = 0. Theorem 2. Let E(X2 1 ) < ∞. Then under H0, 2 log T = 2 Xn i=1 log n 1 − λ n (Xi − µ0) o → χ 2 1 in distribution as n → ∞. A sketch proof. Under H0, EXi = µ0. Therefore µ0 is close to X¯ for large n. Hence the λ, or more precisely, λn ≡ λ/n is small, which is the solution of f(λn) = 0, where f(λn) = 1 n Xn j=1 Xj − µ0 1 − λn(Xj − µ0) . By a simple Taylor expansion 0 = f(λn) ≈ f(0) + f˙(0)λn, λn ≈ −f(0). f˙(0) = −(X¯ − µ0) 1 n X j (Xj − µ0) 2 . Now 2 log T ≈ 2 X i {−λn(Xi − µ0) − λ 2 n 2 (Xi − µ0) 2 } = −2λnn(X¯ − µ0) − λ 2 n X i (Xi − µ0) 2 ≈ n(X¯ − µ0) 2 n−1 P i(Xi − µ0) 2 . By the LLN, n −1 P i(Xi − µ0) 2 → Var(X1). By the CLT, √ n(X¯ − µ0) → N(0, Var(X1)) in distribution. Hence 2 log T → χ 2 1 in distribution