《计量经济学》课程教学资源（国外经典教科书）Introductory Econometrics.pdf_P11-P15

The Nature of econometrics and Economic data observation. When econometric methods are used to analyze time series data, the data should be stored in chronological order The variable avgmin refers to the average minimum wage for the year, avgcov is the average coverage rate( the percentage of workers covered by the minimum wage law), unemp is the unemployment rate, and gnp is the gross national product. We will use these data later in a time series analysis of the effect of the minimum wage or Pooled cross sections Some data sets have both cross-sectional and time series features. For example, suppose at two cross-sectional household surveys are taken in the United States, one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for variables such as income, savings, family size, and so on. In 1990, a new random sample of households is taken using the same survey questions. In order to increase our sample size, we can form a pooled cross section by combining the two years. Because random samples are taken in each year, it would be a fluke if the same household appeared in the sample during both years. ( The size of the sample is usually very small compared with the num- ber of households in the United States. This important factor distinguishes a pooled cross section from a panel data set. Pooling cross sections from different years is often an effective way of analyzing the effects of a new government policy. The idea is to collect data from the years before and after a key policy change. As an example, consider the following data set on hous- ing prices taken in 1993 and 1995, when there was a reduction in property taxes in 1994. Suppose we have data on 250 houses for 1993 and on 270 houses for 1995. One way to store such a data set is given in Table 1.4 Observations I through 250 correspond to the houses sold in 1993, an rations rresp 270 houses sold in 1995. While the re store the data turns out not to be crucial, keeping track of the year for each obser- vation is usually very important. This is why we enter year as a separate variable a pooled cross section is analyzed much like a standard cross section, except that we often need to account for secular differences in the variables across the time. In fact in addition to increasing the sample size, the point of a pooled cross-sectional analysis is often to see how a key relationship has changed over time Panel or Longitudinal Data A panel data (or longitudinal data) set consists of a time series for each cross- sectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period. Or we might collect information, such as investment and financial data, about the same set of firms over a five-year time period. Panel data can also be collected on geographical its. For example, we can collect data for the same set of counties in the United States on immigration flows, tax rates, wage rates, government expenditures, etc, for the years 1980,1985,and1990 The key feature of panel data that distinguishes it from a pooled cross section is the fact that the same cross-sectional units(individuals, firms, or counties in the above

observation. When econometric methods are used to analyze time series data, the data should be stored in chronological order. The variable avgmin refers to the average minimum wage for the year, avgcov is the average coverage rate (the percentage of workers covered by the minimum wage law), unemp is the unemployment rate, and gnp is the gross national product. We will use these data later in a time series analysis of the effect of the minimum wage on employment. Pooled Cross Sections Some data sets have both cross-sectional and time series features. For example, suppose that two cross-sectional household surveys are taken in the United States, one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for variables such as income, savings, family size, and so on. In 1990, a new random sample of households is taken using the same survey questions. In order to increase our sample size, we can form a pooled cross section by combining the two years. Because random samples are taken in each year, it would be a fluke if the same household appeared in the sample during both years. (The size of the sample is usually very small compared with the number of households in the United States.) This important factor distinguishes a pooled cross section from a panel data set. Pooling cross sections from different years is often an effective way of analyzing the effects of a new government policy. The idea is to collect data from the years before and after a key policy change. As an example, consider the following data set on housing prices taken in 1993 and 1995, when there was a reduction in property taxes in 1994. Suppose we have data on 250 houses for 1993 and on 270 houses for 1995. One way to store such a data set is given in Table 1.4. Observations 1 through 250 correspond to the houses sold in 1993, and observations 251 through 520 correspond to the 270 houses sold in 1995. While the order in which we store the data turns out not to be crucial, keeping track of the year for each observation is usually very important. This is why we enter year as a separate variable. A pooled cross section is analyzed much like a standard cross section, except that we often need to account for secular differences in the variables across the time. In fact, in addition to increasing the sample size, the point of a pooled cross-sectional analysis is often to see how a key relationship has changed over time. Panel or Longitudinal Data A panel data (or longitudinal data) set consists of a time series for each crosssectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period. Or we might collect information, such as investment and financial data, about the same set of firms over a five-year time period. Panel data can also be collected on geographical units. For example, we can collect data for the same set of counties in the United States on immigration flows, tax rates, wage rates, government expenditures, etc., for the years 1980, 1985, and 1990. The key feature of panel data that distinguishes it from a pooled cross section is the fact that the same cross-sectional units (individuals, firms, or counties in the above Chapter 1 The Nature of Econometrics and Economic Data 10 14/99 4:34 PM Page 10

The Nature of econometrics and Economic Data eral advantages over cross-sectional data or even pooled cross-sectional data. The ben- fit that we will focus on in this text is that having multiple observations on the same units allows us to control certain unobserved characteristics of individuals, firms, and so on. as we will see. the use of more than one observation can facilitate causal infer- ence in situations where inferring causality would be very difficult if only a single cross section were available. A second advantage of panel data is that it often allows us to study the importance of lags in behavior or the result of decision making. This infor- mation can be significant since many economic policies can be expected to have an impact only after some time has passed. Most books at the undergraduate level do not contain a discussion methods for panel data. However, economists now recognize that some questions are difficult, if not impossible, to answer satisfactorily without panel data. As you will see, we can make considerable progress with simple panel data analysis, a method which is not much more difficult than dealing with a standard cross-sectional data set A Comment on Data structures Part 1 of this text is concerned with the analysis of cross-sectional data, as this poses the fewest conceptual and technical difficulties. At the same time, it illustrates most of he key themes of econometric analysis. We will use the methods and insights from cross-sectional analysis in the remainder of the text. While the econometric analysis of time series uses many of the same tools as sectional analysis, it is more complicated due to the trending, highly persistent I of many economic time series. Examples that have been traditionally used to illustrate the manner in which econometric methods can be applied to time series data are now widely believed to be flawed. It makes little sense to use such examples initially, since his practice will only reinforce poor econometric practice. Therefore, we will postpone the treatment of time series econometrics until Part 2, when the important issues con cerning trends, persistence, dynamics, and seasonality will be introduced In Part 3, we treat pooled cross sections and panel data explicitly. The analysis of independently pooled cross sections and simple panel data analysis are fairly straight- rd extensions of pure cross-sectional analysis. Chapter 13 to deal with these topics 1 4 CAUSALITY AND THE NOTION OF CETERS PARIBUS IN ECONOMETRIC ANALYSS In most tests of economic theory, and certainly for evaluating public policy, the econo- mist's goal is to infer that one variable has a causal effect on another variable(such as crime rate or worker productivity). Simply finding an association between two or more variables might be suggestive, but unless causality can be established, it is rarely The notion of ceteris paribus-which means"other (relevant) factors being equal"plays an important role in causal analysis. This idea has been implicit in some of our earlier discussion, particularly Examples 1.1 and 1. 2, but thus far we have not

eral advantages over cross-sectional data or even pooled cross-sectional data. The benefit that we will focus on in this text is that having multiple observations on the same units allows us to control certain unobserved characteristics of individuals, firms, and so on. As we will see, the use of more than one observation can facilitate causal inference in situations where inferring causality would be very difficult if only a single cross section were available. A second advantage of panel data is that it often allows us to study the importance of lags in behavior or the result of decision making. This information can be significant since many economic policies can be expected to have an impact only after some time has passed. Most books at the undergraduate level do not contain a discussion of econometric methods for panel data. However, economists now recognize that some questions are difficult, if not impossible, to answer satisfactorily without panel data. As you will see, we can make considerable progress with simple panel data analysis, a method which is not much more difficult than dealing with a standard cross-sectional data set. A Comment on Data Structures Part 1 of this text is concerned with the analysis of cross-sectional data, as this poses the fewest conceptual and technical difficulties. At the same time, it illustrates most of the key themes of econometric analysis. We will use the methods and insights from cross-sectional analysis in the remainder of the text. While the econometric analysis of time series uses many of the same tools as crosssectional analysis, it is more complicated due to the trending, highly persistent nature of many economic time series. Examples that have been traditionally used to illustrate the manner in which econometric methods can be applied to time series data are now widely believed to be flawed. It makes little sense to use such examples initially, since this practice will only reinforce poor econometric practice. Therefore, we will postpone the treatment of time series econometrics until Part 2, when the important issues concerning trends, persistence, dynamics, and seasonality will be introduced. In Part 3, we treat pooled cross sections and panel data explicitly. The analysis of independently pooled cross sections and simple panel data analysis are fairly straightforward extensions of pure cross-sectional analysis. Nevertheless, we will wait until Chapter 13 to deal with these topics. 1.4CAUSALITY AND THE NOTION OF CETERIS PARIBUS IN ECONOMETRIC ANALYSIS In most tests of economic theory, and certainly for evaluating public policy, the economist’s goal is to infer that one variable has a causal effect on another variable (such as crime rate or worker productivity). Simply finding an association between two or more variables might be suggestive, but unless causality can be established, it is rarely compelling. The notion of ceteris paribus—which means “other (relevant) factors being equal”—plays an important role in causal analysis. This idea has been implicit in some of our earlier discussion, particularly Examples 1.1 and 1.2, but thus far we have not explicitly mentioned it. Chapter 1 The Nature of Econometrics and Economic Data 13 d 7/14/99 4:34 PM Page 13

The Nature of econometrics and Economic Data You probably remember from introductory economics that most economic ques- ns are ceteris paribus by nature. For example, in analyzing consumer demand, we are interested in knowing the effect of changing the price of a good on its quantity de- manded, while holding all other factors-such as income, prices of other goods, and individual tastes fixed. if other factors are not held fixed then we cannot know the causal effect of a price change on quantity demanded. Holding other factors fixed is critical for policy analysis as well. In the job trainin example(Example 1. 2), we might be interested in the effect of another week of job raining on wages, with all other components being equal (in particular, education and experience). If we succeed in holding all other relevant factors fixed and then find a link between job training and wages, we can conclude that job training has a causal effect on worker productivity. While this may seem pretty simple, even at this early stage it should be clear that, except in very special cases, it will not be possible to literally hold all else equal. The key question in most empirical studies is: Have enough other factors been held fixed to make a case for causality? Rarely is an econometric study evaluated without raising this issue. In most serious applications, the number of factors that can affect the variable of interest--such as criminal activity or wages-is immense, and the isolation of any particular variable may seem like a hopeless effort. However, we will eventually see that, when carefully applied, econometric methods can simulate a ceteris paribus At this point, we cannot yet explain how econometric methods can be used to esti- mate ceteris paribus effects, so we will consider some problems that can arise in trying to infer causality in economics. We do not use any equations in this discussion. For each example, the problem of inferring causality disappears if an appropriate experiment can be carried out. Thus, it is useful to describe how such an experiment might be struc- tured, and to observe that, in most cases, obtaining experimental data is impractical. It is also helpful to think about why the available data fails to have the important features of an experimental data set We rely for now on your intuitive understanding of terms such as random, inde endence. and correlation. all of which should be familiar from bility and statistics course. (These concepts are reviewed in Appendix B )We begin tes some of these E 1.3 (Effects of Fertilizer on Crop Yield) ome early econometric studies [for example Griliches(1957)] considered the effects of new fertilizers on crop yields. Suppose the crop under consideration is soybeans. Since fer tilizer amount is only one factor affecting yields-some others include rainfall, quality of land, and presence of parasites-this issue must be posed as a ceteris paribus question One way to determine the causal effect of fertilizer amount on soybean yield is to conduct an experiment, which might include the following steps. Choose several one-acre plots of land. Apply different amounts of fertilizer to each plot and subsequently measure the yields this gives us a cross-sectional data set. Then, use statistical methods(to be introduced in Chapter 2)to measure the association between yields and fertilizer amount

You probably remember from introductory economics that most economic questions are ceteris paribus by nature. For example, in analyzing consumer demand, we are interested in knowing the effect of changing the price of a good on its quantity demanded, while holding all other factors—such as income, prices of other goods, and individual tastes—fixed. If other factors are not held fixed, then we cannot know the causal effect of a price change on quantity demanded. Holding other factors fixed is critical for policy analysis as well. In the job training example (Example 1.2), we might be interested in the effect of another week of job training on wages, with all other components being equal (in particular, education and experience). If we succeed in holding all other relevant factors fixed and then find a link between job training and wages, we can conclude that job training has a causal effect on worker productivity. While this may seem pretty simple, even at this early stage it should be clear that, except in very special cases, it will not be possible to literally hold all else equal. The key question in most empirical studies is: Have enough other factors been held fixed to make a case for causality? Rarely is an econometric study evaluated without raising this issue. In most serious applications, the number of factors that can affect the variable of interest—such as criminal activity or wages—is immense, and the isolation of any particular variable may seem like a hopeless effort. However, we will eventually see that, when carefully applied, econometric methods can simulate a ceteris paribus experiment. At this point, we cannot yet explain how econometric methods can be used to estimate ceteris paribus effects, so we will consider some problems that can arise in trying to infer causality in economics. We do not use any equations in this discussion. For each example, the problem of inferring causality disappears if an appropriate experiment can be carried out. Thus, it is useful to describe how such an experiment might be structured, and to observe that, in most cases, obtaining experimental data is impractical. It is also helpful to think about why the available data fails to have the important features of an experimental data set. We rely for now on your intuitive understanding of terms such as random, independence, and correlation, all of which should be familiar from an introductory probability and statistics course. (These concepts are reviewed in Appendix B.) We begin with an example that illustrates some of these important issues. EXAMPLE 1.3 (Effects of Fertilizer on Crop Yield) Some early econometric studies [for example, Griliches (1957)] considered the effects of new fertilizers on crop yields. Suppose the crop under consideration is soybeans. Since fertilizer amount is only one factor affecting yields—some others include rainfall, quality of land, and presence of parasites—this issue must be posed as a ceteris paribus question. One way to determine the causal effect of fertilizer amount on soybean yield is to conduct an experiment, which might include the following steps. Choose several one-acre plots of land. Apply different amounts of fertilizer to each plot and subsequently measure the yields; this gives us a cross-sectional data set. Then, use statistical methods (to be introduced in Chapter 2) to measure the association between yields and fertilizer amounts. Chapter 1 The Nature of Econometrics and Economic Data 14 14/99 4:34 PM Page 14