1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1.A General View of the Bootstrap 2.Bootstrap Methods 3.The Jackknife 4.Some limit theory for bootstrap methods 5.The bootstrap and the delta method 6.Bootstrap Tests and Bootstrap Confidence Intervals 7.M-Estimators and the Bootstrap
1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1. A General View of the Bootstrap 2. Bootstrap Methods 3. The Jackknife 4. Some limit theory for bootstrap methods 5. The bootstrap and the delta method 6. Bootstrap Tests and Bootstrap Confidence Intervals 7. M - Estimators and the Bootstrap
2
2
Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods.The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data x~Pa∈P={Pg:B∈曰}.The parameter space曰is allowed to be very general;it could be a subset of R(in which case the model P is a parametric model),or it could be the distributions of all i.i.d.sequences on some measurable space (Y,A)(in which case the model P is the "nonparametric i.i.d."model). Suppose that we have an estimator 0 of 0,and thereby an estimator P of Pa.Consider estimation of: A.The distribution of 0:e.g.P(0 A)=Po(0(X)EA)for a measurable subset A of e; B.f日cRk,Vara(gT(X)for a fixed vector a∈Rk Natural (ideal)bootstrap estimators of these parameters are provided by: A'.Pa(0(X*)∈A): B'.Varo(aT0(X*)). While these ideal bootstrap estimators are often difficult to compute exactly,we can often obtain Monte-Carlo estimates thereof by sampling fromm P:let Xi,...,X be i.i.d.with common distribution P,and calculate 0(X;)for j=1,...,B.Then Monte-Carlo approximations (or implementations)of the bootstrap estimators in A'and B'are given by A".B-1∑B11{X)∈A: B”.B-1∑Ba6X)-B-1∑B1TX》2 If p is a parametric model,the above approach yields a parametric bootstrap.If P is a nonparametric model,then this yields a nonparametric bootstrap.In the following section,we try to make these ideas more concrete first in the context of X =(X1,...,Xn)i.i.d.F or P with P nonparametric so that Po=Fx...x F and P=Fn x...x Fn.Or,if the basic underlying sample space for each Xi is not R,Pa=P×…×P and Pa=PnX·×Pn
Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods. The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data X ∼ Pθ ∈ P = {Pθ : θ ∈ Θ}. The parameter space Θ is allowed to be very general; it could be a subset of R k (in which case the model P is a parametric model), or it could be the distributions of all i.i.d. sequences on some measurable space (X , A) (in which case the model P is the “nonparametric i.i.d.” model). Suppose that we have an estimator ˆθ of θ ∈ Θ, and thereby an estimator Pθˆ of Pθ. Consider estimation of: A. The distribution of ˆθ: e.g. Pθ( ˆθ ∈ A) = Pθ( ˆθ(X) ∈ A) for a measurable subset A of Θ; B. If Θ ⊂ R k , V arθ(a T ˆθ(X)) for a fixed vector a ∈ R k . Natural (ideal) bootstrap estimators of these parameters are provided by: A0 . Pθˆ( ˆθ(X∗ ) ∈ A); B0 . V arθˆ(a T ˆθ(X∗ )). While these ideal bootstrap estimators are often difficult to compute exactly, we can often obtain Monte-Carlo estimates thereof by sampling fromm Pθˆ : let X∗ 1 , . . . , X∗ B be i.i.d. with common distribution Pθˆ, and calculate ˆθ(X∗ j ) for j = 1, . . . , B. Then Monte-Carlo approximations (or implementations) of the bootstrap estimators in A’ and B’ are given by A00 . B−1 PB j=1 1{ ˆθ(X∗ j ) ∈ A}; B00 . B−1 PB j=1(a T ˆθ(X∗ j ) − B−1 PB j=1 a T ˆθ(X∗ j ))2 . If P is a parametric model, the above approach yields a parametric bootstrap. If P is a nonparametric model, then this yields a nonparametric bootstrap. In the following section, we try to make these ideas more concrete first in the context of X = (X1, . . . , Xn) i.i.d. F or P with P nonparametric so that Pθ = F × · · · × F and Pθˆ = Fn × · · · × Fn. Or, if the basic underlying sample space for each Xi is not R, Pθ = P × · · · × P and Pθˆ = Pn × · · · × Pn. 3
4CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron's nonparametric bootstrap;we will then discuss some of the many alternatives. Efron's nonparametric bootstrap Suppose that T(F)is some (real-valued)functional of F.If X1,...,Xn are i.i.d.with dis- tribution function F,then we estimate T(F)by T(Fn)=Tn where Fn is the empirical d.f. Fn=n-1 )More generally,if T(P)is some functional of P and X1,...,Xn are i.i.d.P,then a natural estimator of T(P)is just T(Pn)where Pn is the empirical measure Pn=n-1∑g1dx: Consider estimation of: A.bn(F)=nEF(Tn)-T(F). B.no2(F)≡nVarF(Tn). C.K3.n(F)=EF[Tn-EF(Tn)]3/n(F). D.Hn(x,F)=Pr(Vn(Tn-T(F))<). E.Kn(c,F)≡Pr(√nFn-Flo≤x) F.Ln(,P)=Prp(vnPn-Px)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). The (ideal)nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle:if F(or P)is unknown,estimate it by the empirical distribution function Fn(or the empirical measure Pn).This yields the following nonparametric bootstrap estimates in examples A-F: A'.bn(Fn)=nfEEn (Tn)-T(Fn)} B'.noi(Fn)=nVarg,(Tn). C/.K3.n(Fn)=Eg [Tn-EF (Tn)]3/on(Fn). D'.Hn(a,Fn)≡Pgn(m(Tn-T(fn)≤x): E'.Kn(x,Fn)≡Pgn(VFt-Fnlo≤x) F'.Ln(,Pn)=Prp (vnlPh -PnllF x)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). Because we usually lack closed-form expressions for the ideal bootstrap estimators in A'-F, evaluation of A'-F is usually indirect.Since the empirical d.f.Fn is discrete (with all its mass at the data),we could,in principle enumerate all possible samples of size n from Fn(or Pn)with replacement.If n is large,this is a large number,however:n".Problem:show that the number of distinct bootstrap samples is(] On the other hand,Monte-Carlo approximations to A'-F are easy:let (X1,,Xjm)j=1,,B
4CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron’s nonparametric bootstrap; we will then discuss some of the many alternatives. Efron’s nonparametric bootstrap Suppose that T(F) is some (real-valued) functional of F. If X1, . . . , Xn are i.i.d. with distribution function F, then we estimate T(F) by T(Fn) ≡ Tn where Fn is the empirical d.f. Fn ≡ n −1 Pn i=1 1{Xi ≤ x}. More generally, if T(P) is some functional of P and X1, . . . , Xn are i.i.d. P, then a natural estimator of T(P) is just T(Pn) where Pn is the empirical measure Pn = n −1 Pn i=1 δXi . Consider estimation of: A. bn(F) ≡ n{EF (Tn) − T(F)}. B. nσ2 n (F) ≡ nV arF (Tn). C. κ3,n(F) ≡ EF [Tn − EF (Tn)]3/σ3 n (F). D. Hn(x, F) ≡ PF ( √ n(Tn − T(F)) ≤ x). E. Kn(x, F) ≡ PF ( √ nkFn − Fk∞ ≤ x). F. Ln(x, P) ≡ P rP ( √ nkPn − PkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). The (ideal) nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle: if F (or P) is unknown, estimate it by the empirical distribution function Fn (or the empirical measure Pn). This yields the following nonparametric bootstrap estimates in examples A - F: A0 . bn(Fn) ≡ n{EFn (Tn) − T(Fn)}. B0 . nσ2 n (Fn) ≡ nV arFn (Tn). C0 . κ3,n(Fn) ≡ EFn [Tn − EFn (Tn)]3/σ3 n (Fn). D0 . Hn(x, Fn) ≡ PFn ( √ n(Tn − T(Fn)) ≤ x). E0 . Kn(x, Fn) ≡ PFn ( √ nkF ∗ n − Fnk∞ ≤ x). F 0 . Ln(x, Pn) ≡ P rPn ( √ nkP ∗ n − PnkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). Because we usually lack closed - form expressions for the ideal bootstrap estimators in A0 - F0 , evaluation of A0 - F0 is usually indirect. Since the empirical d.f. Fn is discrete (with all its mass at the data), we could, in principle enumerate all possible samples of size n from Fn (or Pn) with replacement. If n is large, this is a large number, however: n n . [Problem: show that the number of distinct bootstrap samples is 2n−1 n .] On the other hand, Monte-Carlo approximations to A0 − F 0 are easy: let (X∗ j1 , . . . , X∗ jn) j = 1, . . . , B
2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn(or Pn);let Fn()≡n be the empirical d.f.of the j-th sample,and let Tn≡T(Fn),j=1,.,B Then approximations of A'-F are given by: A".bB三n{a∑月Tn-Tn} B”.no品g=n∑月1(Tn-T2 C".Kn.B(Tin -Ta)3/onB D”.H.B(x)=a∑Bl{V元(Tn-Tn)≤x以. E”.K.B(x)≡a∑Bl{VF防n-Fne≤x以 F”.克B()=言∑B1l{VPm-PF≤x以. For fixed sample size n and data Fn,it follows from the Glivenko-Cantelli theorem (applied to the bootstrap sampling)that sup,B(x)-Hn(c,Fn)l→as.0asB→oo, and,by Donsker's theorem, VB(HtB(x)-Hn(x,Fn)》→U*(Hn(x,Fn)asB→o. Moreover,by the Dvoretzky,Kiefer,Wolfowitz (1956)inequality P(Un >A)<2exp(-22)for all n and A>0 where the constant 2 before the exponential comes via Massart (1990)), P(sup.B(c)-Hn(x,Fn)川≥e)≤2exp(-2Be2). For a given e>0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power)sufficiently large.Since the deviations of H"B from Hn(,Fn)are so well -understood and controlled,much of our discussion below will focus on the differences between Hn(x,Fn)and Hn(,F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo;here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median).Suppose that T(F)= F-1(1/2).Then T(Fn)=Fn1(1/2)=Xm+l/2 and T()=F路-1(1/2)=Xm+1/2
2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn (or Pn); let F ∗ j,n(x) ≡ n −1Xn i=1 1[X∗ j,i≤x] be the empirical d.f. of the j−th sample, and let T ∗ j,n ≡ T(F ∗ j,n), j = 1, . . . , B. Then approximations of A0 − F 0 are given by: A00 . b ∗ n,B ≡ n n 1 B PB j=1 T ∗ j,n − Tn o . B00 . nσ∗2 n,B ≡ n 1 B PB j=1(T ∗ j,n − T∗ n ) 2 . C00 . κ ∗ 3,n,B ≡ 1 B PB j=1(T ∗ j,n − T∗ n ) 3/σ∗3 n,B. D00 . H∗ n,B(x) ≡ 1 B PB j=1 1{ √ n(T ∗ j,n − Tn) ≤ x}. E00 . K∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkF ∗ j,n − Fnk∞ ≤ x}. F 00 . L ∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkP ∗ j,n − PnkF ≤ x}. For fixed sample size n and data Fn, it follows from the Glivenko - Cantelli theorem (applied to the bootstrap sampling) that sup x |H∗ n,B(x) − Hn(x, Fn)| →a.s. 0 as B → ∞, and, by Donsker’s theorem, √ B(H∗ n,B(x) − Hn(x, Fn)) ⇒ U ∗∗(Hn(x, Fn)) as B → ∞. Moreover, by the Dvoretzky, Kiefer, Wolfowitz (1956) inequality ( P(kUnk ≥ λ) ≤ 2 exp(−2λ 2 ) for all n and λ > 0 where the constant 2 before the exponential comes via Massart (1990)), P(sup x |H∗ n,B(x) − Hn(x, Fn)| ≥ ) ≤ 2 exp(−2B2 ). For a given > 0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power) sufficiently large. Since the deviations of H∗ n,B from Hn(x, Fn) are so well -understood and controlled, much of our discussion below will focus on the differences between Hn(x, Fn) and Hn(x, F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo; here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median). Suppose that T(F) = F −1 (1/2). Then T(Fn) = F −1 n (1/2) = X([n+1]/2) and T(F ∗ n ) = F ∗−1 n (1/2) = X∗ ([n+1]/2)