Introduction to Probability and Statistics Using R G.Jay Kerns FIRST EDITION
Introduction to Probability and Statistics Using R G. Jay Kerns First Edition
Contents Preface vii List of Figures xiii List of Tables 1 An Introduction to Probability and Statistics 1 1.1 1 l.2 Statistics...·········· 1 Chapter Exercises.········· 3 2 An Introduction to R 5 2.1 Downloading and Installing R 5 2.2 Communicating with R........ 6 2.3 Basic R Operations and Concepts..·..· 8 2.4 Getting Help.············ 14 2.5 External Resources........ 15 2.6 Other Tips............ 16 Chapter Exercises......... 17 3 Data Description 19 3.1 Types of Data........ 19 3.2 Features of Data Distributions 33 3.3 Descriptive Statistics......... 35 3.4 Exploratory Data Analysis 40 3.5 Multivariate Data and Data Frames 45 3.6 Comparing Populations 47 Chapter Exercises 53 4 Probability 65 4.1 Sample Spaces.... 65 4.2 Events 70 4.3 Model Assignment 。 75 4.4 Properties of Probability 80 4.5 Counting Methods... 84 4.6 Conditional Probability 89 4.7 Independent Events..············ 95 4.8 Bayes'Rule..··.··············· 98 4.9 Random Variables.....,..:。·.·············· 102 Chapter Exercises......... 105 iii
Contents Preface vii List of Figures xiii List of Tables xv 1 An Introduction to Probability and Statistics 1 1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 An Introduction to R 5 2.1 Downloading and Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Communicating with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Basic R Operations and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 External Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Other Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Data Description 19 3.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Features of Data Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Multivariate Data and Data Frames . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Comparing Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Probability 65 4.1 Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 Model Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Properties of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.5 Counting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.6 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.7 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.8 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.9 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 iii
公 CONTENTS 5 Discrete Distributions 107 5.1 Discrete Random Variables ............. 107 5.2 The Discrete Uniform Distribution 110 5.3 The Binomial Distribution 111 5.4 Expectation and Moment Generating Functions... 116 5.5 The Empirical Distribution.·················· 120 5.6 Other Discrete Distributions 123 5.7 Functions of Discrete Random Variables 130 Chapter Exercises........................... 132 6 Continuous Distributions 137 6.1 Continuous Random Variables..... 。 137 6.2 The Continuous Uniform Distribution 142 6.3 The Normal Distribution.. 143 6.4 Functions of Continuous Random Variables 146 6.5 Other Continuous Distributions...... 150 Chapter Exercises.············ 155 7 Multivariate Distributions 157 7.1 Joint and Marginal Probability Distributions.·····. 157 7.2 Joint and Marginal Expectation .. 163 7.3 Conditional Distributions .. 165 7.4 Independent Random Variables... 167 7.5 Exchangeable Random Variables 170 7.6 The Bivariate Normal Distribution 170 7.7 Bivariate Transformations of Random Variables......... 172 7.8 Remarks for the Multivariate Case 175 7.9 The Multinomial Distribution 178 Chapter Exercises..········ 180 8 Sampling Distributions 181 8.1 Simple Random Samples ...... 。。g 182 8.2 Sampling from a Normal Distribution 182 8.3 The Central Limit Theorem....... 185 8.4 Sampling Distributions of Two-Sample Statistics 187 8.5 Simulated Sampling Distributions 189 Chapter Exercises............... 191 9 Estimation 193 9.1 Point Estimation... 193 9.2 Confidence Intervals for Means....,············· 202 9.3 Confidence Intervals for Differences of Means.··..·...·· 208 9.4 Confidence Intervals for Proportions 210 9.5 Confidence Intervals for Variances 212 9.6 Fitting Distributions....... 212 9.7 Sample Size and Margin of Error................ 212 9.8 Other Topics.....·············· 214 Chapter Exercises ............ 215
iv CONTENTS 5 Discrete Distributions 107 5.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2 The Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 Expectation and Moment Generating Functions . . . . . . . . . . . . . . . . . 116 5.5 The Empirical Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.6 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7 Functions of Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . 130 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6 Continuous Distributions 137 6.1 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2 The Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . 142 6.3 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.4 Functions of Continuous Random Variables . . . . . . . . . . . . . . . . . . . 146 6.5 Other Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7 Multivariate Distributions 157 7.1 Joint and Marginal Probability Distributions . . . . . . . . . . . . . . . . . . . 157 7.2 Joint and Marginal Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.4 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.5 Exchangeable Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 170 7.6 The Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 170 7.7 Bivariate Transformations of Random Variables . . . . . . . . . . . . . . . . . 172 7.8 Remarks for the Multivariate Case . . . . . . . . . . . . . . . . . . . . . . . . 175 7.9 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8 Sampling Distributions 181 8.1 Simple Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.2 Sampling from a Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 182 8.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.4 Sampling Distributions of Two-Sample Statistics . . . . . . . . . . . . . . . . 187 8.5 Simulated Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . 189 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9 Estimation 193 9.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.2 Confidence Intervals for Means . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.3 Confidence Intervals for Differences of Means . . . . . . . . . . . . . . . . . . 208 9.4 Confidence Intervals for Proportions . . . . . . . . . . . . . . . . . . . . . . . 210 9.5 Confidence Intervals for Variances . . . . . . . . . . . . . . . . . . . . . . . . 212 9.6 Fitting Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 9.7 Sample Size and Margin of Error . . . . . . . . . . . . . . . . . . . . . . . . . 212 9.8 Other Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
CONTENTS 10 Hypothesis Testing 217 l0.1 Introduction......··· 217 10.2 Tests for Proportions 218 10.3 One Sample Tests for Means and Variances 224 10.4 Two-Sample Tests for Means and Variances 227 10.5 Other Hypothesis Tests 228 10.6 Analysis of Variance 。 229 10.7 Sample Size and Power 230 Chapter Exercises..... 232 11 Simple Linear Regression 235 11.1 Basic Philosophy 235 1l.2 Estimation...· 239 11.3 Model Utility and Inference. 248 11.4 Residual Analysis........ 252 11.5 Other Diagnostic Tools 259 Chapter Exercises ......... 266 12 Multiple Linear Regression 267 12.1 The Multiple Linear Regression Model. 267 12.2 Estimation and Prediction......... 270 12.3 Model Utility and Inference... 277 12.4 Polynomial Regression 280 12.5 Interaction....... 283 12.6 Qualitative Explanatory Variables.. 286 12.7 Partial F Statistic 289 12.8 Residual Analysis and Diagnostic Tools 291 12.9 Additional Topics 292 Chapter Exercises .. 296 13 Resampling Methods 297 13.1 Introduction.. 297 13.2 Bootstrap Standard Errors... 299 13.3 Bootstrap Confidence Intervals 303 13.4 Resampling in Hypothesis Tests 305 Chapter Exercises 309 14 Categorical Data Analysis 311 15 Nonparametric Statistics 313 16 Time Series 315 A R Session Information 317 GNU Free Documentation License 319 C History 327 D Data 329
CONTENTS v 10 Hypothesis Testing 217 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 10.2 Tests for Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.3 One Sample Tests for Means and Variances . . . . . . . . . . . . . . . . . . . 224 10.4 Two-Sample Tests for Means and Variances . . . . . . . . . . . . . . . . . . . 227 10.5 Other Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 10.6 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10.7 Sample Size and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 11 Simple Linear Regression 235 11.1 Basic Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 11.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 11.3 Model Utility and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 11.4 Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 11.5 Other Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 12 Multiple Linear Regression 267 12.1 The Multiple Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . 267 12.2 Estimation and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 12.3 Model Utility and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 12.4 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 12.5 Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 12.6 Qualitative Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.7 Partial F Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 12.8 Residual Analysis and Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . 291 12.9 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 13 Resampling Methods 297 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 13.2 Bootstrap Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 13.3 Bootstrap Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 303 13.4 Resampling in Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . 305 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 14 Categorical Data Analysis 311 15 Nonparametric Statistics 313 16 Time Series 315 A R Session Information 317 B GNU Free Documentation License 319 C History 327 D Data 329
vi CONTENTS D.1 Data Structures 329 D.2 Importing Data 334 D.3 Creating New Data Sets.. 335 D.4 Editing Data 335 D.5 Exporting Data 336 D.6 Reshaping Data 337 E Mathematical Machinery 339 E.1 Set Algebra 339 E.2 Differential and Integral Calculus 340 E.3 Sequences and Series 343 E.4 The Gamma Function 345 E.5 Linear Algebra 345 E.6 Multivariable Calculus 347 F Writing Reports with R 349 F.1 What to Write..... 349 F2 How to Write It with R 350 F.3 Formatting Tables 353 F.4 Other Formats..... 353 G Instructions for Instructors 355 G.1 Generating This Document.. 356 G.2 How to Use This Document... 356 G.3 Ancillary Materials....... 357 G.4 Modifying This Document 357 H RcmdrTestDrive Story 359 Bibliography 363 Index 369
vi CONTENTS D.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 D.2 Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 D.3 Creating New Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 D.4 Editing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 D.5 Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 D.6 Reshaping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 E Mathematical Machinery 339 E.1 Set Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 E.2 Differential and Integral Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 340 E.3 Sequences and Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 E.4 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 E.5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 E.6 Multivariable Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 F Writing Reports with R 349 F.1 What to Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 F.2 How to Write It with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 F.3 Formatting Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 F.4 Other Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 G Instructions for Instructors 355 G.1 Generating This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 G.2 How to Use This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 G.3 Ancillary Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 G.4 Modifying This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 H RcmdrTestDrive Story 359 Bibliography 363 Index 369