当前位置：和泉文库 > 统计 > 浏览文档

《实用非参数统计》课程教学资源（阅读材料）W.J.Braun's Nonparametric regression notes

文件格式：PDF，文件大小：306.01KB，售价：7.58元

文档详细内容（约26页）

1.1.SPLINE REGRESSION 7 The only difficulty is the poor conditioning of the truncated power basis which will result in inaccuracies in the calculation of B.It is for this reason that the B-spline basis was introduced.Using this basis,we re-formulate the regression model as p+k BBi,p(x）+e (1.6) i=0 or in vector-matrix form y=BB+E where the (j,i)element of B is Bip(i).The least-squares estimate of B is then 3=(BTB)-BTy The orthogonality of the B-splines which are far enough apart results in a banded matrix BTB which has better conditioning properties than the matrix TTT.The bandedness property actually allows for the use of more efficient numerical techniques in computing B.Again,all of the usual regression techniques are available.The only drawback with this model is that the coefficients are uninterpretable,and the B-splines are a little less intuitive than the truncated power functions. We have been assuming that the knots are known.In general,they are unknown,and they must be chosen.Badly chosen knots can result in bad approximations.Because the spline regression problem can be formulated as an ordinary regression problem with a transformed predictor,it is possible to apply variable selection techniques such as back- ward selection to choose a set of knots.The usual approach is to start with a set of knots located at a subset of the order statistics of the predictor.Then backward selection is applied,using the truncated power basis form of the model.Each time a basis function is eliminated,the corresponding knot is eliminated.The method has drawbacks,notably the ill-conditioning of the basis as mentioned earlier. Figure 1.3 exhibits an example of a least-squares spline with automatically generated knots,applied to a data set consisting of titanium measurements.3 A version of backward selection was used to generated these knots;the stopping rule used was similar to the Akaike Information Criterion (AIC)discussed in Chapter 6.Although this least-squares spline fit to these data is better than what could be obtained using polynomial regression, it is unsatisfactory in many ways.The flat regions are not modelled smoothly enough, and the peak is cut off. 3To obtain Figure 1.3,type attach(titanium) y.1m <-1m(g bs(temperature,knots=c(755,835,905,975), Boundary.knots=c(550,1100))) plot(titanium) lines(temperature,predict(y.lm))

1.1. SPLINE REGRESSION 7 The only difficulty is the poor conditioning of the truncated power basis which will result in inaccuracies in the calculation of βb. It is for this reason that the B-spline basis was introduced. Using this basis, we re-formulate the regression model as yj = X p+k i=0 βiBi,p(xi) + εj (1.6) or in vector-matrix form y = Bβ + ε where the (j, i) element of B is Bi,p(xj ). The least-squares estimate of β is then βb = (B TB) −1B T y The orthogonality of the B-splines which are far enough apart results in a banded matrix BTB which has better conditioning properties than the matrix T T T. The bandedness property actually allows for the use of more efficient numerical techniques in computing βb. Again, all of the usual regression techniques are available. The only drawback with this model is that the coefficients are uninterpretable, and the B-splines are a little less intuitive than the truncated power functions. We have been assuming that the knots are known. In general, they are unknown, and they must be chosen. Badly chosen knots can result in bad approximations. Because the spline regression problem can be formulated as an ordinary regression problem with a transformed predictor, it is possible to apply variable selection techniques such as backward selection to choose a set of knots. The usual approach is to start with a set of knots located at a subset of the order statistics of the predictor. Then backward selection is applied, using the truncated power basis form of the model. Each time a basis function is eliminated, the corresponding knot is eliminated. The method has drawbacks, notably the ill-conditioning of the basis as mentioned earlier. Figure 1.3 exhibits an example of a least-squares spline with automatically generated knots, applied to a data set consisting of titanium measurements.3 A version of backward selection was used to generated these knots; the stopping rule used was similar to the Akaike Information Criterion (AIC) discussed in Chapter 6. Although this least-squares spline fit to these data is better than what could be obtained using polynomial regression, it is unsatisfactory in many ways. The flat regions are not modelled smoothly enough, and the peak is cut off. 3To obtain Figure 1.3, type attach(titanium) y.lm <- lm(g ~ bs(temperature, knots=c(755, 835, 905, 975), Boundary.knots=c(550, 1100))) plot(titanium) lines(temperature, predict(y.lm))

1.1.SPLINE REGRESSION 9 that was already there improves the fit as well.4 1.1.3 Smoothing Splines One way around the problem of choosing knots is to use lots of them.A result analogous to the Weierstrass approximation theorem says that any sufficiently smooth function can be approximated arbitrarily well by spline functions with enough knots. The use of large numbers of knots alone is not sufficient to avoid trouble,since we will over-fit the data if the number of knots k is taken so large that p++1>n.In that case, we would have no degrees of freedom left for estimating the residual variance.A standard way of coping with the former problem is to apply a penalty term to the least-squares problem.One requires that the resulting spline regression estimate has low curvature as measured by the square of the second derivative. More precisely,one may try to minimize (for a given constant A) over the set of all functions S(x)which are twice continuously differentiable.The solution to this minimization problem has been shown to be a cubic spline which is surprisingly easy to calculate.5 Thus,the problem of choosing a set of knots is replaced by selecting a value for the smoothing parameter A.Note that if A is small,the solution will be a cubic spline which almost interpolates the data;increasing values of A render increasingly smooth approximations The usual way of choosing A is by cross-validation.The ordinary cross-validation choice of入minimizes CV()=∑-S6(c) j=1 where (()is the smoothing spline obtained using parameter A,using all data but the jth observation.Note that the CV function is similar in spirit to the PRESS statistic,but 4The plot in Figure 1.4 can be generated using y.1m<-1m(g~bs(temperature,knots=c(755,835,885,895,915,975), Boundary.knots=c(550,1100))) plot(titanium) lines(spline(temperature,predict(y.lm))) 5The B-spline coefficients for this spline can be obtained from an expression of the form B=(BTB+λDTD)-1Bry where B is the matrix used for least-squares regression splines and D is a matrix that arises in the calculation involving the squared second derivatives of the spline.Details can be found in de Boor (1978).It is sufficient to note here that this approach has similarities with ridge regression,and that the estimated regression is a linear function of the responses

1.1. SPLINE REGRESSION 9 that was already there improves the fit as well.4 1.1.3 Smoothing Splines One way around the problem of choosing knots is to use lots of them. A result analogous to the Weierstrass approximation theorem says that any sufficiently smooth function can be approximated arbitrarily well by spline functions with enough knots. The use of large numbers of knots alone is not sufficient to avoid trouble, since we will over-fit the data if the number of knots k is taken so large that p+k+1 > n. In that case, we would have no degrees of freedom left for estimating the residual variance. A standard way of coping with the former problem is to apply a penalty term to the least-squares problem. One requires that the resulting spline regression estimate has low curvature as measured by the square of the second derivative. More precisely, one may try to minimize (for a given constant λ) Xn j=1 (yj − S(xj ))2 + λ Z b a (S ′′(x))2 dx over the set of all functions S(x) which are twice continuously differentiable. The solution to this minimization problem has been shown to be a cubic spline which is surprisingly easy to calculate.5 Thus, the problem of choosing a set of knots is replaced by selecting a value for the smoothing parameter λ. Note that if λ is small, the solution will be a cubic spline which almost interpolates the data; increasing values of λ render increasingly smooth approximations. The usual way of choosing λ is by cross-validation. The ordinary cross-validation choice of λ minimizes CV(λ) = Xn j=1 (yj − Sbλ,(j)(xj ))2 where Sbλ,(j)(x) is the smoothing spline obtained using parameter λ, using all data but the jth observation. Note that the CV function is similar in spirit to the PRESS statistic, but 4The plot in Figure 1.4 can be generated using y.lm <- lm(g ~ bs(temperature, knots=c(755, 835, 885, 895, 915, 975), Boundary.knots=c(550, 1100))) plot(titanium) lines(spline(temperature, predict(y.lm))) 5The B-spline coefficients for this spline can be obtained from an expression of the form βb = (B T B + λDT D) −1B T y where B is the matrix used for least-squares regression splines and D is a matrix that arises in the calculation involving the squared second derivatives of the spline. Details can be found in de Boor (1978). It is sufficient to note here that this approach has similarities with ridge regression, and that the estimated regression is a linear function of the responses

点击进入文档下载页（PDF格式）

共26页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录