6CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m [n+1]/2,and let Mj=#=Xj(w):i=1,...,n},j=1,...,n so that M=(M,...,Mn)~Multn(n,(1/n,...,1/n)). Now[Xim)>X((a]=[nF(X(k)(u)≤m-,and hence P(T(F克)=Xtm>X((o)lFn)=P(n(X((a》≤m-1Fn) P(Binomial(n,k/n)<m-1) while P(Tn >x)=P(X(m)>2)=P(nFn()<m) ()eom-ra This implies that P(T(F)=X(k)(w)Fn) for k 1,...,n. Example 2.2 (Standard deviation of a correlation coefficient estimator).Let T(F)=p(F)where F is the bivariate distribution of a pair of random variables (X,Y)with finite fourth moments.We know from chapter 2 that the sample correlation coefficient n =T(Fn)satisfies vn(pn-p)≡vn(p(Fn)-p(F)→aN(0,V2) where V2=Var[Z1-(p/2)[Z2+Z3]]where Z=(Z1,Z2,Z3)~N3(0,>and is given by 2=E(XY-p,X-1,y2-1)82: here Xs =(X-ux)/ax and Ys =(Y-uy)/oy are the standardized variables.If F is bivariate normal,then V2 =(1-p2)2. Consider estimation of the standard deviation of pn: on(F)=Varr(pn))112. The normal theory estimator of on(F)is (1-2)/vn-3. The delta-method estimate of on(F)is =(Var(z -(p/2)+
6CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m = [n + 1]/2, and let Mj ≡ #{X∗ i = Xj (ω) : i = 1, . . . , n}, j = 1, . . . , n so that M ≡ (M1, . . . , Mn) ∼ Multn(n,(1/n, . . . , 1/n)). Now [X∗ (m) > X(k) (ω)] = [nF ∗ n (X(k) (ω)) ≤ m − 1], and hence P(T(F ∗ n ) = X∗ (m) > X(k) (ω)|Fn) = P(nF ∗ n (X(k) (ω)) ≤ m − 1|Fn) = P(Binomial(n, k/n) ≤ m − 1) = mX−1 j=0 n j (k/n) j (1 − k/n) n−j , while P(Tn > x) = P(X(m) > x) = P(nFn(x) < m) = mX−1 j=0 n j F(x) j (1 − F(x))n−j . This implies that P(T(F ∗ n ) = X(k) (ω)|Fn) = mX−1 j=0 ( n j k − 1 n j 1 − k − 1 n n−j − n j k n j 1 − k n n−j ) for k = 1, . . . , n. Example 2.2 (Standard deviation of a correlation coefficient estimator). Let T(F) = ρ(F) where F is the bivariate distribution of a pair of random variables (X, Y ) with finite fourth moments. We know from chapter 2 that the sample correlation coefficient ˆρn ≡ T(Fn) satisfies √ n(ˆρn − ρ) ≡ √ n(ρ(Fn) − ρ(F)) →d N(0, V 2 ) where V 2 = V ar[Z1 − (ρ/2)[Z2 + Z3]] where Z ≡ (Z1, Z2, Z3) ∼ N3(0, Σ) and Σ is given by Σ = E(XsYs − ρ, X2 s − 1, Y 2 s − 1)⊗2 ; here Xs ≡ (X − µX)/σX and Ys ≡ (Y − µY )/σY are the standardized variables. If F is bivariate normal, then V 2 = (1 − ρ 2 ) 2 . Consider estimation of the standard deviation of ˆρn: σn(F) ≡ {V arF (ˆρn)} 1/2 . The normal theory estimator of σn(F) is (1 − ρˆ 2 n )/ √ n − 3. The delta-method estimate of σn(F) is Vˆ n √ n = {V ar d[Z1 − (ρ/2)[Z2 + Z3]]} 1/2 / √ n
2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to)the bootstrap estimate of on(F)is B B-1m-p2. 1=1 Finally the jackknife estimate of on(F)is n- n -2: 11 see the beginning of section 2 for the notation used here.We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping(sampling from the empirical measure Pn)be- comes clear,it seems natural to consider sampling from other estimators of the unknown P.For example,if we are quite confident that some parametric model holds,then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model.Here is a formal description of this type of model-based bootstrap procedure. Let (A)be a measurable space,and let P={P:0e}be a model,parametric,semi- parametric or nonparametric.We do not insist that e be finite-dimensional.For example, in a parametric extreme case p could be the family of all normal (Gaussian)distributions on (,A)=(R4,Bd).Or,to give a nonparametric example with only a smoothness restriction,P could be the family of all distributions on(,A)=(Ra,Bd)with a density with respect to Lebesgue measure which is uniformly continuous. Let X1,...,Xn,...be i.i.d.with distribution PE P.We assume that there exists an estimator =(X1,...,Xn)of.Then Efron's parametric (or model-based)bootstrap proceeds by sam- pling from the estimated or fitted model P=P:suppose that ,..are independent and identically distributed with distribution P on (,A),and let (1) =the parametric bootstrap empirical measure. i=1 The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model-based estimator P=p of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1,...,Xn are i.i.d.Po=N(u,o2)where =(u,o2).Let on= (n,)=(n:2)where 2 is the usual unbiased estimator of o2,and hence n(an-四~tn-, On -)品心xX- 2 Now P=N(),and ifiare i.i.d.P then the bootstrap estimators=(2) satisfy,conditionally on Fn, Vn(inin)~tn-1, 壳 u-1)2~X2-r 6 Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case
2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to) the bootstrap estimate of σn(F) is vuutB−1X B j=1 [ρb ∗ j − ρ ∗] 2. Finally the jackknife estimate of σn(F) is vuut n − 1 n Xn j=1 [ρb(i) − ρb(·) ] 2; see the beginning of section 2 for the notation used here. We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping (sampling from the empirical measure Pn) becomes clear, it seems natural to consider sampling from other estimators of the unknown P. For example, if we are quite confident that some parametric model holds, then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model. Here is a formal description of this type of model - based bootstrap procedure. Let (X , A) be a measurable space, and let P = {Pθ : θ ∈ Θ} be a model, parametric, semiparametric or nonparametric. We do not insist that Θ be finite - dimensional. For example, in a parametric extreme case P could be the family of all normal (Gaussian) distributions on (X , A) = (R d , B d ). Or, to give a nonparametric example with only a smoothness restriction, P could be the family of all distributions on (X , A) = (R d , B d ) with a density with respect to Lebesgue measure which is uniformly continuous. Let X1, . . . , Xn, . . . be i.i.d. with distribution Pθ ∈ P. We assume that there exists an estimator ˆθn = ˆθn(X1, . . . , Xn) of θ. Then Efron’s parametric (or model - based) bootstrap proceeds by sampling from the estimated or fitted model Pθˆ(ω) ≡ Pˆω n : suppose that X∗ n,1 , . . . , X∗ n,n are independent and identically distributed with distribution Pˆω n on (X , A), and let P ∗ n ≡ n −1Xn i=1 δX∗ n,i (1) ≡ the parametric bootstrap empirical measure . The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model - based estimator Pˆ n = pθˆn of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1, . . . , Xn are i.i.d. Pθ = N(µ, σ2 ) where θ = (µ, σ2 ). Let ˆθn = (ˆµn, σˆ 2 n ) = (Xn, S2 n ) where S 2 n is the usual unbiased estimator of σ 2 , and hence √ n(ˆµn − µ) σˆn ∼ tn−1, (n − 1)ˆσ 2 n σ 2 ∼ χ 2 n−1 . Now Pθˆn = N(ˆµn, σˆ 2 n ), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then the bootstrap estimators ˆθ ∗ n = (ˆµ ∗ n , σˆ ∗2 n ) satisfy, conditionally on Fn, √ n(ˆµ ∗ n − µˆn) σˆ ∗ n ∼ tn−1, (n − 1)ˆσ ∗2 n σˆ 2 n ∼ χ 2 n−1 . Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case
8CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1,...,Xn are i.i.d.Po=exponential(1/0):Po(X1 >t)=exp(-t/0) for t 0.Then n =Xn and non/0 Gamma(n,1).Now Pi =exponential(1/n),and if Xi,...,are i.i.d.P,then n=n has (non/nn)Gamma(n,1),so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a "smoothed empirical measure";or the "smoothed boot- strap”).Suppose that P={P on (Ra,Bd):p= dp d入 exists and is uniformly continuous. Then one way to estimate P so that our estimator PnE P is via a kernel estimator of the density p: in(d)=i ∫() dPn(y) where k:Rd->R is a uniformly continuous density function.Then Pn is defined for CA by n(C)= pn(x)dx, and the model-based bootstrap proceeds by sampling from Pn There are many other examples of this type involving nonparametric or semiparametric models P.For some work on "smoothed bootstrap"methods see e.g.Silverman and Young (1987)and Hall,DiCiccio,and Romano (1989). Exchangeably-weighted and "Bayesian"bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X;equal the observations Xi(w)in the underlying sample.Thinking about the process of sampling at random (with replacement)from the population described by the empirical measure Pn,it becomes clear that we can think of the bootstrap empirical measure P as the empirical measure with multinomial random weights: P= 1∑x:= M:6x:(@) i= This view of Efron's nonparametric bootstrap as the empirical measure with random weights sug- gests that we could obtain other random measures which would behave much the same way as Efron's nonparametric bootstrap,but without the same random sampling interpretation,by re- placing the vector of multinomial weights by some other random vector W.One of the possible deficiencies of the nonparametric bootstrap involves its "discreteness"via missing observations in the original sample:note that the number of points of the original sample which are missed (or not given any bootstrap weight)is Nn=#jn:M=0)=>11{M=0).hence the proportion of observations missed by the bootstrap is n-1Nn,and the expected number proportion of missed observations is E(n-1Nn)=P(M=0)=(1-1/n)”→e-1=.36787.…
8CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1, . . . , Xn are i.i.d. Pθ = exponential(1/θ): Pθ(X1 > t) = exp(−t/θ) for t ≥ 0. Then ˆθn = Xn and nˆθn/θ ∼ Gamma(n, 1). Now Pθˆn = exponential(1/ ˆθn), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then ˆθ ∗ n = X ∗ n has (nˆθ ∗ n/ ˆθn|Fn) ∼ Gamma(n, 1), so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a “smoothed empirical measure”; or the “smoothed bootstrap”). Suppose that P = {P on (R d , B d ) : p = dP dλ exists and is uniformly continuous}. Then one way to estimate P so that our estimator Pˆ n ∈ P is via a kernel estimator of the density p: pˆn(x) = 1 b d n Z k y − x bn dPn(y) where k : R d → R is a uniformly continuous density function. Then Pˆ n is defined for C ∈ A by Pˆ n(C) = Z C pˆn(x)dx, and the model- based bootstrap proceeds by sampling from Pˆ n. There are many other examples of this type involving nonparametric or semiparametric models P. For some work on “smoothed bootstrap” methods see e.g. Silverman and Young (1987) and Hall, DiCiccio, and Romano (1989). Exchangeably - weighted and “Bayesian” bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X∗ i equal the observations Xj (ω) in the underlying sample. Thinking about the process of sampling at random (with replacement) from the population described by the empirical measure Pn, it becomes clear that we can think of the bootstrap empirical measure P ∗ n as the empirical measure with multinomial random weights: P ∗ n = 1 n Xn i=1 δX∗ i = 1 n Xn i=1 MiδXi(ω) . This view of Efron’s nonparametric bootstrap as the empirical measure with random weights suggests that we could obtain other random measures which would behave much the same way as Efron’s nonparametric bootstrap, but without the same random sampling interpretation, by replacing the vector of multinomial weights by some other random vector W. One of the possible deficiencies of the nonparametric bootstrap involves its “discreteness” via missing observations in the original sample: note that the number of points of the original sample which are missed (or not given any bootstrap weight) is Nn ≡ #{j ≤ n : Mj = 0} = Pn j=1 1{Mj = 0}. hence the proportion of observations missed by the bootstrap is n −1Nn, and the expected number proportion of missed observations is E(n −1Nn) = P(M1 = 0) = (1 − 1/n) n → e −1 ˙=.36787 . . .
2. BOOTSTRAP METHODS 9 Moreover,from occupancy theory for urn models vn(n-1Nn-(1-1/m))aN(0,e-1(1-2e-1)=N(0,.09720887.…)月 see e.g.Johnson and Kotz(1977),page 317,3.with r =0.]By using some other vector of exchangeable weights W rather than Mn~Multn(n,(1/n,...,1/n)),we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure,it seems reasonable to require that the components of W should sum to n.Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable,it seems reasonable to require that the vector W have an exchangeable distribution:i.e.W=(W(1),...,W(n))4W for all permutations of {1,..,n}.Then PW Wni6X:(w) i=1 is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W.Here are several examples. Example 2.6 (Dirichlet weights).Suppose that Yi,Y2,...are i.i.d.exponential(1)random vari- ables,and set nYi Wni三 yi+…+Yn i=1,.,n. The resulting random vector W/n has a Dirichlet(1,...,1)distribution;i.e.n-WD where the Di's are the spacings of a random sample of n-1 Uniform(0,1)random variables Example 2.7 (More general continuous weights).Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y's by some other distribution on R+.It will turn out that the limit theory can be established for any of these weights as long as the Yi's satisfy YiL2.1;i.e.P(Y>t)dt <oo. Example 2.8 (Jackknife weights).Suppose that w=(wn,1,...,wn,n)is a vector of constants which sum ton:n=n.Let Wbearandom permutation of the coordinates of w:if Ris uni- formly distributed overΠ≡{all permutations of{l,.,n}},then W≡Ew≡(wn,1,.,wn,rn): If we take w =(n/(n-d))1n-d (n/(n-d)(1,...1,0,...0)where In-d is the vector with all 1's in the first n-d coordinates and 0's in the remaining d coordinates,then these weights Wn.i corre- spond to the delete-d jackknife.It turns out that these weights yield behavior like that of Efron's nonparametric bootstrap (with multinomial weights)only if d=dn satisfies n-dna >0. Other weights W based on various urn schemes are also possible;see Praestgaard and Wellner (1993)for some of these
2. BOOTSTRAP METHODS 9 [Moreover, from occupancy theory for urn models √ n(n −1Nn − (1 − 1/n) n ) →d N(0, e−1 (1 − 2e −1 )) = N(0, .09720887 . . .); see e.g. Johnson and Kotz (1977), page 317, 3. with r = 0.] By using some other vector of exchangeable weights W rather than Mn ∼ Multn(n,(1/n, . . . , 1/n)), we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure, it seems reasonable to require that the components of W should sum to n. Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable, it seems reasonable to require that the vector W have an exchangeable distribution: i.e. πW ≡ (Wπ(1), . . . , Wπ(n) ) d= W for all permutations π of {1, . . . , n}. Then P W n ≡ 1 n Xn i=1 WniδXi(ω) is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W. Here are several examples. Example 2.6 (Dirichlet weights). Suppose that Y1, Y2, . . . are i.i.d. exponential(1) random variables, and set Wni ≡ nYi Y1 + · · · + Yn , i = 1, . . . , n. The resulting random vector W/n has a Dirichlet(1, . . . , 1) distribution; i.e. n −1W d= D where the Di ’s are the spacings of a random sample of n − 1 Uniform(0, 1) random variables. Example 2.7 (More general continuous weights). Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y ’s by some other distribution on R +. It will turn out that the limit theory can be established for any of these weights as long as the Yi ’s satisfy Yi ∈ L2,1; i.e. R ∞ 0 p P(|Y | > t)dt < ∞. Example 2.8 (Jackknife weights). Suppose that w = (wn,1, . . . , wn,n) is a vector of constants which sum to n: Pn i=1 wn,i = n. Let W be a random permutation of the coordinates of w: if R is uniformly distributed over Π ≡ {all permutations of {1, . . . , n}}, then W ≡ R w ≡ (wn,R1 , . . . , wn,Rn ). If we take w = (n/(n − d))1n−d = (n/(n − d)(1, . . . 1, 0, . . . 0) where 1n−d is the vector with all 1’s in the first n−d coordinates and 0’s in the remaining d coordinates, then these weights Wn,i correspond to the delete -d jackknife. It turns out that these weights yield behavior like that of Efron’s nonparametric bootstrap (with multinomial weights) only if d = dn satisfies n −1dn → α > 0. Other weights W based on various urn schemes are also possible; see Praestgaard and Wellner (1993) for some of these