当前位置：和泉文库 > 统计 > 浏览文档

《多元统计分析》课程教学资源（阅读材料）Model selection in linear regression

文件格式：PDF，文件大小：203.87KB，售价：3元

文档详细内容（约10页）

3 left in the model.Repeat this procedure with those that are left,continually dropping variables until some stopping criterion is met.A typical criterion is that all p-values are above some threshold,like p >0.1. Forwards Selection:First run a model with no covariates,included in the model, i.e.,intercept only.Then,run p separate models,one for each of the possible independent variables,keeping track of the p-values each time.At the next step, consider a model with a single variable in it,the one with the lowest p values at the first step.Repeat this procedure,so that at the second step,consider all models that have two parameters in it,the one selected at the first step,and all others,one at a time,and create the second model as the one where the second value has the smallest p value,and so on.Continue to add variables until some stopping criterion is met.A typical criterion is that all p-values left at some stage are above some threshold,like p >0.15,so no more new parameters are added. Backwards and Forwards Selection:At each stage,consider both dropping and/or adding variables,checking some criterion (e.g.,based again on some p-value thresholds).A combination of the above two strategies. All subsets regression:Alternative to backwards/forwards procedures,a generic term which describes the idea of calculating some"fit"criterion over all possible models.We will see some of these below.In general,if there are p potential predictor variables,there will be 2p possible models.For example,if here are five possible X variables,there will be 25=32 models,and so on. AIC criterion:Stands for Akaikies Information Criterion.For each model,calcu- late: AIC=nln(SSE)-nln(n)+2p where SSE is the usual residual sum of squares from that model,p is the number of parameters in the current model,and n is the sample size.After doing this for all possible models,the "best"model is the one with the smallest AIC Note that the AIC is formed from three terms:The first is a measure of fit,since nIn(SSE)is essentially the sum of squared residuals.The second term,n In(n) is a constant,and really plays no role in selecting the model.The third term, 2p is a "penalty"term for adding more terms to the model.This is because the first term always decreases as more terms are added into the model,so this is needed for“balance'”. Mallows Cp criterion:Attempts to measure bias from a regression model.See textbook for full description,we omit details here. R2 criterion:Choose the model with largest R2.In general,this model will simply be the largest model,so not a very useful criterion.Can be helpful in choosing

3 left in the model. Repeat this procedure with those that are left, continually dropping variables until some stopping criterion is met. A typical criterion is that all p-values are above some threshold, like p > 0.1. Forwards Selection: First run a model with no covariates, included in the model, i.e., intercept only. Then, run p separate models, one for each of the possible independent variables, keeping track of the p-values each time. At the next step, consider a model with a single variable in it, the one with the lowest p values at the first step. Repeat this procedure, so that at the second step, consider all models that have two parameters in it, the one selected at the first step, and all others, one at a time, and create the second model as the one where the second value has the smallest p value, and so on. Continue to add variables until some stopping criterion is met. A typical criterion is that all p-values left at some stage are above some threshold, like p > 0.15, so no more new parameters are added. Backwards and Forwards Selection: At each stage, consider both dropping and/or adding variables, checking some criterion (e.g., based again on some p-value thresholds). A combination of the above two strategies. All subsets regression: Alternative to backwards/forwards procedures, a generic term which describes the idea of calculating some “fit” criterion over all possible models. We will see some of these below. In general, if there are p potential predictor variables, there will be 2p possible models. For example, if here are five possible X variables, there will be 25 = 32 models, and so on. AIC criterion: Stands for Akaikies Information Criterion. For each model, calculate: AIC = n ln(SSE) − n ln(n) + 2p where SSE is the usual residual sum of squares from that model, p is the number of parameters in the current model, and n is the sample size. After doing this for all possible models, the “best” model is the one with the smallest AIC. Note that the AIC is formed from three terms: The first is a measure of fit, since n ln(SSE) is essentially the sum of squared residuals. The second term, n ln(n) is a constant, and really plays no role in selecting the model. The third term, 2p is a “penalty” term for adding more terms to the model. This is because the first term always decreases as more terms are added into the model, so this is needed for “balance”. Mallows Cp criterion: Attempts to measure bias from a regression model. See textbook for full description, we omit details here. R2 criterion: Choose the model with largest R2 . In general, this model will simply be the largest model, so not a very useful criterion. Can be helpful in choosing

5 BF = P r{data|model1} P r{data|model2} but now where P r{data|modeli} is defined as P r{data|modeli} = Z (likelihood × prior) dθi where θi represents the vector of all unknown parameters for model i. What the above integral really means is that when there are unknown parameters in the terms in the definition of the Bayes Factor, we integrate them out (i.e., like an average) over the prior distribution for these parameters. Problem: These can be hard to calculate, as they can involve high dimensional integrals. So, we can approximate Bayes Factors by the BIC, see below. BIC criterion: Stands for Bayesian Information Criterion, sometimes called the SBC for Schwartz Bayesian Criterion. For each model, calculate: BIC = n ln(SSE) − n ln(n) + ln(n)p where SSE is the usual residual sum of squares from that model, p is the number of parameters in the current model, and n is the sample size. After doing this for all possible models, the “best” model is the one with the smallest BIC. Note the similarity between AIC and BIC, only the last term changes. We will compare the properties of these two criteria in detail later. Details are omitted here (see article by Raftery), but it can be shown that the BIC is related to an approximate Bayes factor, from a model with low information prior distributions (equal to one prior observation centered at the null value of zero for each coefficient). In some ways the best criterion to use for predictions, in large part because it leads to model averaging, see Raftery for details. We will extensively use Raftery’s program called bic.glm for the rest of the year, extremely useful for model selection for prediction and for estimating effects. DIC criterion: Stands for “Deviance Information Criterion”. Similar to the BIC, but designed for hierarchical models, estimates an “effective number of parameters” for hierarchical models. Beyond the scope of this course, but beware of its existence if you need to select a hierarchical model

点击进入文档下载页（PDF格式）

共10页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录