1.2 The Special Case 2x2 1 BASICS 1.2 The Special Case 2x2 Consider the matrix A A= A11A12] A21A22 Determinant and trace det(A)=A11A22-A12A21 Tr(A)=A11+A22 Eigenvalues X2-λ.Tr(A)+det(A)=0 Tr(A)+Tr(A)2-4det(A) 2 =Tr(A)-Tr(A)-4det(A) 入1+2=T(A) 1λ2=det(A) Eigenvectors Inverse A-(A) 1「A22-A12 -A21A11 PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 6
1.2 The Special Case 2x2 1 BASICS 1.2 The Special Case 2x2 Consider the matrix A A = · A11 A12 A21 A22 ¸ Determinant and trace det(A) = A11A22 − A12A21 Tr(A) = A11 + A22 Eigenvalues λ 2 − λ · Tr(A) + det(A) = 0 λ1 = Tr(A) + p Tr(A) 2 − 4 det(A) 2 λ2 = Tr(A) − p Tr(A) 2 − 4 det(A) 2 λ1 + λ2 = Tr(A) λ1λ2 = det(A) Eigenvectors v1 ∝ · A12 λ1 − A11 ¸ v2 ∝ · A12 λ2 − A11 ¸ Inverse A−1 = 1 det(A) · A22 −A12 −A21 A11 ¸ Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 6
2 DERIVATIVES 2 Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X.Note that it is always assumed that X has no special structure,i.e. that the elements of X are independent(e.g.not symmetric,Toeplitz,positive definite).See section 2.5 for differentiation of structured matrices.The basic assumptions can be written in a formula as 0XL二6kδ5 OXij that is for e.g. vector forms, 0x对 Ox Ox Oxi ∂ y]材 Dyi The following rules are general and very useful when deriving the differential of an expression ([10]): OA = 0 (A is a constant) (1) 0(ax) a0X ( 0(X+Y) = 0X+8Y 3) a(Tr(X)) Tr(ax) a(xY) (Ox)Y+X(aY) 5) a(XoY) (0x)oY+Xo(OY) (6) a(X⑧Y) (aX)⑧Y+X⑧(aY) ∂(X-1) -X-1(8X)X-1 (8) a(det(X)) det(X)Tr(X-1aX) (9) (In(det(X))) Tr(X-0X) (10) 0X1 (ox)T (11) oXH (8x)H (12) 2.1 Derivatives of a Determinant 2.1.1 General form adet(Y)=det(Y)Tr Y-10Y 2.1.2 Linear forms det(X)=det(X)(X-)T OX Odet(AXB)=det(AXB)(X-1)T=det(AXB)(XT)-1 ∂X PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 7
2 DERIVATIVES 2 Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X. Note that it is always assumed that X has no special structure, i.e. that the elements of X are independent (e.g. not symmetric, Toeplitz, positive definite). See section 2.5 for differentiation of structured matrices. The basic assumptions can be written in a formula as ∂Xkl ∂Xij = δikδlj that is for e.g. vector forms, · ∂x ∂y ¸ i = ∂xi ∂y · ∂x ∂y ¸ i = ∂x ∂yi · ∂x ∂y ¸ ij = ∂xi ∂yj The following rules are general and very useful when deriving the differential of an expression ([10]): ∂A = 0 (A is a constant) (1) ∂(αX) = α∂X (2) ∂(X + Y) = ∂X + ∂Y (3) ∂(Tr(X)) = Tr(∂X) (4) ∂(XY) = (∂X)Y + X(∂Y) (5) ∂(X ◦ Y) = (∂X) ◦ Y + X ◦ (∂Y) (6) ∂(X ⊗ Y) = (∂X) ⊗ Y + X ⊗ (∂Y) (7) ∂(X −1 ) = −X −1 (∂X)X −1 (8) ∂(det(X)) = det(X)Tr(X −1 ∂X) (9) ∂(ln(det(X))) = Tr(X −1 ∂X) (10) ∂X T = (∂X) T (11) ∂X H = (∂X) H (12) 2.1 Derivatives of a Determinant 2.1.1 General form ∂ det(Y) ∂x = det(Y)Tr · Y−1 ∂Y ∂x ¸ 2.1.2 Linear forms ∂ det(X) ∂X = det(X)(X−1 ) T ∂ det(AXB) ∂X = det(AXB)(X−1 ) T = det(AXB)(XT ) −1 Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 7
2.2 Derivatives of an Inverse 2 DERIVATIVES 2.1.3 Square forms If X is square and invertible,then Odet(XTAX)=2det(XTAX)X-T ∂X If X is not square but A is symmetric,then Odet(XTAX) =2det(XTAX)AX(XTAX)-1 OX If X is not square and A is not symmetric,then 0det(XTAX) =det(XTAX)(AX(XTAX)-1+ATX(XTATX)-1) (13) OX 2.1.4 Other nonlinear forms Some special cases are(See [8]) 0In det(XTX)I=2(X+)T aX 0lndet(XTX)=_2xT ∂X+ alnl det(X)I=(x-1)T =(xT)-1 0X Odet(X)=kdet(Xk)X-T OX See [7]. 2.2 Derivatives of an Inverse From [15]we have the basic identity Y-1 =-Y-1yy-1 Ox 8x from which it follows 0X-1)起=-(X-k:(X-) OXij 0aTX-1b =-X-TabTX-T aX 8det(X-1) =-det(X-1)(X-1)T OX OTr(AX-B) aX =-(X-1BAX-1)T PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 8
2.2 Derivatives of an Inverse 2 DERIVATIVES 2.1.3 Square forms If X is square and invertible, then ∂ det(XT AX) ∂X = 2 det(XT AX)X−T If X is not square but A is symmetric, then ∂ det(XT AX) ∂X = 2 det(XT AX)AX(XT AX) −1 If X is not square and A is not symmetric, then ∂ det(XT AX) ∂X = det(XT AX)(AX(XT AX) −1 + AT X(XT AT X) −1 ) (13) 2.1.4 Other nonlinear forms Some special cases are (See [8]) ∂ ln det(XT X)| ∂X = 2(X+) T ∂ ln det(XT X) ∂X+ = −2XT ∂ ln | det(X)| ∂X = (X−1 ) T = (XT ) −1 ∂ det(Xk ) ∂X = k det(Xk )X−T See [7]. 2.2 Derivatives of an Inverse From [15] we have the basic identity ∂Y−1 ∂x = −Y−1 ∂Y ∂x Y−1 from which it follows ∂(X−1 )kl ∂Xij = −(X−1 )ki(X−1 )jl ∂a T X−1b ∂X = −X−T abT X−T ∂ det(X−1 ) ∂X = − det(X−1 )(X−1 ) T ∂Tr(AX−1B) ∂X = −(X−1BAX−1 ) T Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 8
2.3 Derivatives of Matrices,Vectors and Scalar Forms 2 DERIVATIVES 2.3 Derivatives of Matrices,Vectors and Scalar Forms 2.3.1 First Order OxTb ObTx 0x-0x .=b OaTXb =abT aX OaTXTb =baT 0X DaXaOaTXTa=aat ax OX 0x=J的 OXij 8XA五=im(A)n=(JmnA)H 0Xmn aXTA五=in(Am=(JmmA为 Xmn 2.3.2 Second Order 9∑XuXmn=2∑Xu Xij klmn kl ObTXTXc=X(beT+cbT) OX @(Bx+b)TC(Dx+d)=BTC(Dx+d)+DTCT(Bx+b) Ox (XTBX)=5u(XTB)+(BX)a 0X订 0(XTBX)XTBJ+JBX (J)M-6u6t 0X订 See Sec 8.2 for useful properties of the Single-entry matrix Jij 0xTBx=(B+BT)x 8x abTXTDXe =DTXbeT+DXcbT aX JX(Xb+c)TD(Xb+c)=(D+DT)(Xb+c)bT PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 9
2.3 Derivatives of Matrices, Vectors and Scalar Forms 2 DERIVATIVES 2.3 Derivatives of Matrices, Vectors and Scalar Forms 2.3.1 First Order ∂x T b ∂x = ∂b T x ∂x = b ∂a T Xb ∂X = abT ∂a T XT b ∂X = baT ∂a T Xa ∂X = ∂a T XT a ∂X = aaT ∂X ∂Xij = J ij ∂(XA)ij ∂Xmn = δim(A)nj = (J mnA)ij ∂(XT A)ij ∂Xmn = δin(A)mj = (J nmA)ij 2.3.2 Second Order ∂ ∂Xij X klmn XklXmn = 2X kl Xkl ∂b T XT Xc ∂X = X(bcT + cbT ) ∂(Bx + b) T C(Dx + d) ∂x = B T C(Dx + d) + DT CT (Bx + b) ∂(XT BX)kl ∂Xij = δlj (XT B)ki + δkj (BX)il ∂(XT BX) ∂Xij = XT BJij + J jiBX (J ij )kl = δikδjl See Sec 8.2 for useful properties of the Single-entry matrix J ij ∂x T Bx ∂x = (B + B T )x ∂b T XT DXc ∂X = DT XbcT + DXcbT ∂ ∂X (Xb + c) T D(Xb + c) = (D + DT )(Xb + c)b T Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 9
2.3 Derivatives of Matrices,Vectors and Scalar Forms 2 DERIVATIVES Assume W is symmetric,then 0 s(x-As)TW(x-As)=-2ATW(x-As) x(x-As)TW(x-As)--2W(x-As) 8 A(x-As)TW(x-As)=-2W(x-As)sT 2.3.3 Higher order and non-linear a'x- - (x'yPab2(x-1-r (14) =0 Rx9rxh-x1axrx +(X)Tx"abT(X"-1-)T (15) See A.0.1 for a proof. Assume s and r are functions of x,i.e.s=s(x),r =r(x),and that A is a constant,then 灰 (A+AT)s 0s]T T L As+ 2.3.4 Gradient and Hessian Using the above we have for the gradient and the hessian f=xTAx+bTx af 了xf=x =(A+AT)x+b 82f OxOxT=A+AT PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 10
2.3 Derivatives of Matrices, Vectors and Scalar Forms 2 DERIVATIVES Assume W is symmetric, then ∂ ∂s (x − As) TW(x − As) = −2ATW(x − As) ∂ ∂x (x − As) TW(x − As) = −2W(x − As) ∂ ∂A (x − As) TW(x − As) = −2W(x − As)s T 2.3.3 Higher order and non-linear ∂ ∂X a T Xnb = nX−1 r=0 (Xr ) T abT (Xn−1−r ) T (14) ∂ ∂X a T (Xn ) T Xnb = nX−1 r=0 h Xn−1−rabT (Xn ) T Xr +(Xr ) T XnabT (Xn−1−r ) T i (15) See A.0.1 for a proof. Assume s and r are functions of x, i.e. s = s(x), r = r(x), and that A is a constant, then ∂ ∂x s T As = · ∂s ∂x ¸T (A + AT )s ∂ ∂x s T Ar = · ∂s ∂x ¸T As + · ∂r ∂x ¸T AT r 2.3.4 Gradient and Hessian Using the above we have for the gradient and the hessian f = x T Ax + b T x ∇xf = ∂f ∂x = (A + AT )x + b ∂ 2f ∂x∂xT = A + AT Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 10