CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION Chapter 10 Generalized Least Squares Estimation 10. 1 Model y=XB+E EX]=0 Ee|x]=a9=2(92>0) 1. Heteroskedasticity 0 2. Autocorrelation 1 11 Pn- 0.2 OLS and Iv estimation ● OLS estimation The Ols estimator can be written as b=B+(X'X 1. Unbiasedness E=Ex[E团X]=B 2. Variance-Coviance Matrix Var[IX]=E[(b-B)(b-B) EI(XX (X'X)X(o2Q)X(XX)- The unconditional variance is Ex Var例X If e is normally distributed bIXNN(B,02(X'X)XSX(X'X)
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 1 Chapter 10 Generalized Least Squares Estimation 10.1 Model y = Xβ + ε E [ε|X] = 0 E [εε′ |X] = σ 2Ω = Σ (Ω > 0). 1. Heteroskedasticity σ 2Ω = σ 2 w11 0 ∼ w22 . . . 0 ∼ wnn = σ 2 1 0 ∼ . . . . . . 0 ∼ σ 2 n 2. Autocorrelation σ 2Ω = σ 2 1 ρ1 · · · ρn−1 β1 1 · · · ρn−2 . . . . . . ρn−1 · · · · · · 1 10.2 OLS and IV estimation • OLS estimation The OLS estimator can be written as b = β + (X ′X) −1 X ′ ε. 1. Unbiasedness E [b] = EX [E [b|X]] = β. 2. Variance—Coviance Matrix V ar [b|X] = E (b − β) (b − β) ′ |X = E (X ′X) −1 X ′ εε′X (X ′X) −1 |X = (X ′X) −1 X ′ σ 2Ω X (X ′X) −1 . The unconditional variance is EX [V ar [b|X]] . If ε is normally distributed, b|X ∼ N β, σ2 (X ′X) −1 X ′ΩX (X ′X) −1
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3. Consistency Suppose that XX Q>0 X'QX PP>0 The bIX 1/xx)- 2X'QX/X'X P 0 →0. Using this and Chebyshev's inequality, we have for and aE) and e>0 P[a(b-)>≤ aE(b-B)(b-B)a d'Var(b)a 0asn→a which implies B 4. Asymptotic distribution of b Assume(Xi, Ei) is a sequence of independent observations with E(e)=ding(2,…,2) In addition, assume for any AE Rk and 8>0 E|XXe2+≤ B for all i Then, we can apply the CLT for a sequence of independent random variables with gives ∑XX;d 0,1in∑E(XXx)
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 2 3. Consistency Suppose that X′X n P→ Q > 0 X′ΩX n P→ P > 0. Then V ar [b|X] = 1 n X′X n −1 σ 2X′ΩX n X′X n −1 P→ 0 and V ar [b] P→ 0. Using this and Chebyshev’s inequality, we have for and α ∈ Rk−{0} and ε > 0 P [|α ′ (b − β)| > ε] ≤ α ′E (b − β) (b − β) ′ α ε 2 = α ′V ar (b) α ε 2 → 0 as n → ∞ which implies b p→ β. 4. Asymptotic distribution of b Assume (Xi , εi) is a sequence of independent observations with E (εε′ ) = diag σ 2 1 , · · · , σ2 n = Σ In addition, assume for any λ ∈ Rk and δ > 0 E |λ ′Xiεi | 2+δ ≤ B for all i. Then, we can apply the CLT for a sequence of independent random variables, with gives λ ′Xiεi √ n d→ N 0, limn→∞ 1 n ∞ i=1 E λ ′Xiε 2 i X ′ iλ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION But ∑E(Nx ∑EE(XX=XX) OXE(X:XA 1 ∑E(XX入 -X02X,X/ A X(X∑X)A Thu ∑X P=plin-X∑X, When Ei are serially correlated, we need a different set of conditions and CLT to derive the asymptotic normality result. See White's "Asymptotic Theory for Ecor ● IV estimation Qzz(>0 Qzx(≠ X ZoX (Zi, ei 'is a sequence of independent random vectors with E(EEx)=diag(o?, .. 02 ∑.E|Nz=+°≤ B for all i for any A∈ Re and6>0.Then, letting Qzx)QxzQz2
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3 But 1 n E λ ′Xiε 2 i X ′ iλ = 1 n EE λ ′Xiε 2 i X ′ iλ|X = 1 n σ 2 i λ ′E (XiX ′ i ) λ = 1 n λ ′σ 2 i E (XiX ′ i ) λ = plim1 n λ ′σ 2 i XiX ′ iλ = plim1 n λ ′ (X ′ΣX) λ. Thus Xiεi √ n d→ N (0, P), where P = plim1 n X ′ΣX, and we obtain √ n (b − β) d→ N 0, Q−1P Q−1 . When εi are serially correlated, we need a different set of conditions and CLT to derive the asymptotic normality result. See White’s “Asymptotic Theory for Econometricians” for this. • IV estimation Assume Z ′Z n p→ QZZ (> 0) Z ′X n p→ QZX ( = 0) X′X n p→ QXX (> 0) Z ′ΩX n p→ QZΣX (Z ′ i , εi) ′ is a sequence of independent random vectors with E (εε′ |X) = diag (σ 2 1 , · · · , σ2 n ) = Σ. E |λ ′Ziεi | 2+δ ≤ B for all i for any λ ∈ Rk and δ > 0. Then, letting QXXZ = QXZQ −1 ZZQZX−1 QXZQ −1 ZZ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION n(brv-B)=N(0, QxxzQz2zQxx When Ei are serially correlated, as before, we need more assumptions and a different CLT 10.3 Robust estimation of asymptotic covariance matrices We can still use Ols for inference if its variance-covariance matrix (XX)-x∑X(XX)- can be estimated Suppose that Obviously, of,..., 02 cannot be estimated. But what we need is to estimate X'>X not ∑. We may write 1 ∑xX This ∑e2Xx have the same probability limit by the LLN. We replace a? with e, and, then, hav XX 1 -x{(3 BXX EiXiX+op (1) (See White(1980, Econometrica) for details Thus CeXi Xi consistently estimate X'EX, and the estimated asymptotic variance- covariance matrix b is XX ∑eX XX
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 4 √ n (bIV − β) d→ N (0, QXXZQZΣZQXXZ). When εi are serially correlated, as before, we need more assumptions and a different CLT. 10.3 Robust estimation of asymptotic covariance matrices We can still use OLS for inference if its variance—covariance matrix (X ′X) −1 X ′ΣX (X ′X) −1 can be estimated. Suppose that Σ = diag σ 2 1 , · · · , σ2 n . Obviously, σ 2 1 , · · · , σ2 n cannot be estimated. But what we need is to estimate X′ΣX not Σ. We may write 1 n X ′ΣX = 1 n σ 2 i XiX ′ i . This 1 n n i=1 ε 2 i XiX ′ i have the same probability limit by the LLN. We replace ε 2 i with e 2 i and, then, have 1 n e 2 i XiX ′ i = 1 n εi − X ′ i βˆ − β 2 XiX ′ i = 1 n ε 2 i XiX ′ i + op (1). (See White (1980, Econometrica) for details) Thus 1 n e 2 i XiX′ i consistently estimate 1 nX′ΣX, and the estimated asymptotic variance— covariance matrix b is 1 n X ′X −1 1 n e 2 i XiX ′ i 1 n X ′X −1 p→ Q −1P Q−1
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION We can use this result for hypothesis testing. Suppose that the null hypothesis is Ho RB=r. Then, Wald test is defined by W=(Rb-r)R(X'X) 2ex; X: (XX)R(Rb-r) (heteroskedasticity robust Wald test and as r→o wx(), J=rank(R) This follows because √(Bb-r)|P/xr)- ∑e XX 2N(0,D)N(0,) (刀) If the null hypothesis is Ho: Bk= Bi, use the t-ratio bk- Bk V=(Xx)∑2xx(Xx) (0,1) This is Whites heteroskedasticity robust t-ratio 10. 4 GLS Since o>0. it can be factored as Q= CAO where the columns of C are the characteristic vectors of Q and the characteristic roots of Q2 are put in the diagonal matrix A
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 5 We can use this result for hypothesis testing. Suppose that the null hypothesis is H0 : Rβ = r. Then, Wald test is defined by W = (Rb − r) ′ R (X ′X) −1e 2 i XiX ′ i (X ′X) −1 R ′ (Rb − r) (heteroskedasticity robust Wald test) and as n → ∞ W d→ χ 2 (J), J = rank (R). This follows because W = √ n (Rb − r) ′ R X′X n −1 1 n e 2 i XiX ′ i X′X n −1 R ′ −1 √ n (Rb − r) d→ N (0, IJ ) ′ N (0, IJ ) = χ 2 (J). If the null hypothesis is H0 : βk = β 0 k , use the t − ratio t = bk − β 0 √ k Vkk where V = (X ′X) −1e 2 i XiX ′ i (X ′X) −1 . As t → ∞ t d→ N (0, 1). This is White’s heteroskedasticity robust t − ratio. 10.4 GLS Since Ω > 0, it can be factored as Ω = CΛC ′ where the columns of C are the characteristic vectors of Ω and the characteristic roots of Ω are put in the diagonal matrix Λ