Contents 11 Association Between Variables 767 11.1 Introduction............................767 11.1.1 Measure of Association 768 11.1.2 Chapter Summary····· 769 11.2 Chi Square Based Measures 769 11.2.1Phi......··.。·.。…·…··…·……· 774 11.2.2 Contingency coefficient 778 l1.2.3 Cramer'sV......,..········…···· 782 11.2.4 Summary of Chi Square Based Measures... 784 ll.3 Reduction in Error Measures·.············ 786 766
Contents 11 Association Between Variables 767 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 11.1.1 Measure of Association . . . . . . . . . . . . . . . . . 768 11.1.2 Chapter Summary . . . . . . . . . . . . . . . . . . . . 769 11.2 Chi Square Based Measures . . . . . . . . . . . . . . . . . . . 769 11.2.1 Phi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 11.2.2 Contingency coefficient . . . . . . . . . . . . . . . . . 778 11.2.3 Cramer’s V . . . . . . . . . . . . . . . . . . . . . . . . 782 11.2.4 Summary of Chi Square Based Measures . . . . . . . . 784 11.3 Reduction in Error Measures . . . . . . . . . . . . . . . . . . 786 766
Chapter 11 Association Between Variables 11.1 Introduction In previous chapters,much of the discussion concerned a single variable. describing a distribution,calculating summary statistics,obtaining interval estimates for parameters and testing hypotheses concerning these parame- ters.Statistics that describe or make inferences about a single distribution are referred to as univariate statistics.While univariate statistics form the basis for many other types of statistics,none of the issues concerning relationships among variables can be answered by examining only a single variable.In order to examine relationships among variables,it is neces- sary to move to at least the level of bivariate statistics,examining two variables.Frequently the researcher wishes to move beyond this to mul- tivariate statistics,where the relationships among several variables are simultaneously examined. Cross classification tables,used to determine independence and depen- dence for events and for variables,are one type of bivariate statistics.A test for a difference between two proportions can also be considered a type of bivariate statistics.The only other example of bivariate methods used so far in this textbook is the test for the difference between two means,us- ing either the normal or the t distribution.The latter is the only bivariate method which has been used to examine variables that have interval or ratio level scales. An example of a relationship that a researcher might investigate is the 767
Chapter 11 Association Between Variables 11.1 Introduction In previous chapters, much of the discussion concerned a single variable, describing a distribution, calculating summary statistics, obtaining interval estimates for parameters and testing hypotheses concerning these parameters. Statistics that describe or make inferences about a single distribution are referred to as univariate statistics. While univariate statistics form the basis for many other types of statistics, none of the issues concerning relationships among variables can be answered by examining only a single variable. In order to examine relationships among variables, it is necessary to move to at least the level of bivariate statistics, examining two variables. Frequently the researcher wishes to move beyond this to multivariate statistics, where the relationships among several variables are simultaneously examined. Cross classification tables, used to determine independence and dependence for events and for variables, are one type of bivariate statistics. A test for a difference between two proportions can also be considered a type of bivariate statistics. The only other example of bivariate methods used so far in this textbook is the test for the difference between two means, using either the normal or the t distribution. The latter is the only bivariate method which has been used to examine variables that have interval or ratio level scales. An example of a relationship that a researcher might investigate is the 767
Measures of Association 768 relationship between political party supported and opinion concerning so- cioeconomic issues.In Chapters 9 and 10.the relationship between political party supported and opinion concerning various explanations for unemploy- ment,among a sample of Edmonton adults,was examined.This type of re- lationship was examined using a cross classification table and the chi square statistic.Differences of proportions,or difference of mean opinion could have been used as a method of examining this relationship as well.In this chap- ter,various summary measures are used to describe these relationships.The chi square statistic from the cross classification table is modified to obtain a measure of association.Correlation coefficients and regression models are also used to examine the relationship among variables which have ordinal, interval or ratio level scales. Bivariate and multivariate statistics are useful not only for statistical reasons,but they form a large part of social science research.The social sci- ences are concerned with explaining social phenomena and this necessarily involves searching for,and testing for,relationships among variables.So- cial phenomena do not just happen,but have causes.In looking for causal factors,attempting to determine which variables cause or influence other variables,the researcher examines the nature of relationships among vari- ables.Variables that appear to have little relationship with the variable that the researcher is attempting to explaing may be ignored.Variables which appear to be related to the variable being explained must be closely exam- ined.The researcher is concerned with whether a relationship among vari- ables exists or not.If the relationship appears to exist,then the researcher wishes to know more concerning the nature of this relationship.The size and strength of the relationship are of concern,and there are various tests concerning these. In this chapter,there is no examination of multivariate relationships, where several variables are involved.This chapter looks only at bivariate relationships,testing for the existence of such relationships,and attempting to describe the strength and nature of such relationships.The two variable methods of this chapter can be extended to the examination of multivariate relationships.But the latter methods are beyond the scope of an introduc- tory textbook,and are left to more advanced courses in statistics. 11.1.1 Measure of Association Measures of association provide a means of summarizing the size of the association between two variables.Most measures of association are scaled
Measures of Association 768 relationship between political party supported and opinion concerning socioeconomic issues. In Chapters 9 and 10, the relationship between political party supported and opinion concerning various explanations for unemployment, among a sample of Edmonton adults, was examined. This type of relationship was examined using a cross classification table and the chi square statistic. Differences of proportions, or difference of mean opinion could have been used as a method of examining this relationship as well. In this chapter, various summary measures are used to describe these relationships. The chi square statistic from the cross classification table is modified to obtain a measure of association. Correlation coefficients and regression models are also used to examine the relationship among variables which have ordinal, interval or ratio level scales. Bivariate and multivariate statistics are useful not only for statistical reasons, but they form a large part of social science research. The social sciences are concerned with explaining social phenomena and this necessarily involves searching for, and testing for, relationships among variables. Social phenomena do not just happen, but have causes. In looking for causal factors, attempting to determine which variables cause or influence other variables, the researcher examines the nature of relationships among variables. Variables that appear to have little relationship with the variable that the researcher is attempting to explaing may be ignored. Variables which appear to be related to the variable being explained must be closely examined. The researcher is concerned with whether a relationship among variables exists or not. If the relationship appears to exist, then the researcher wishes to know more concerning the nature of this relationship. The size and strength of the relationship are of concern, and there are various tests concerning these. In this chapter, there is no examination of multivariate relationships, where several variables are involved. This chapter looks only at bivariate relationships, testing for the existence of such relationships, and attempting to describe the strength and nature of such relationships. The two variable methods of this chapter can be extended to the examination of multivariate relationships. But the latter methods are beyond the scope of an introductory textbook, and are left to more advanced courses in statistics. 11.1.1 Measure of Association Measures of association provide a means of summarizing the size of the association between two variables. Most measures of association are scaled
Measures of Association 769 so that they reach a maximum numerical value of I when the two variables have a perfect relationship with each other.They are also scaled so that they have a value of 0 when there is no relationship between two variables. While there are exceptions to these rules,most measures of association are of this sort.Some measures of association are constructed to have a range of only 0 to 1,other measures have a range from-1 to +1.The latter provide a means of determining whether the two variables have a positive or negative association with each other. Tests of significance are also provided for many of the measures of as- sociation.These tests begin by hypothesizing that there is no relationship between the two variables,and that the measure of association equals 0. The researcher calculates the observed value of the measure of association, and if the measure is different enough from 0,the test shows that there is a significant relationship between the two variables. 11.1.2 Chapter Summary This chapter begins with measures of association based on the chi square statistic.It will be seen in Section 11.2 that the x2 statistic is a function not only of the size of the relationship between the two variables,but also of the sample size and the number of rows and columns in the table.This statistic can be adjusted in various ways,in order to produce a measure of associ- ation.Following this,in Section 11.3,a different approach to obtaining a measure of association is outlined.This is to consider how much the error of prediction for a variable can be reduced when the researcher has knowledge of a second variable.Section ?examines various correlation coefficients, measures which summarize the relationship between two variables that have an ordinal or higher level of measurement.Finally,Section ?presents the regression model for interval or ratio variables.The regression model allows the researcher to estimate the size of the relationship between two variables, where one variable is considered the independent variable,and the other variable depends on the first variable. 11.2 Chi Square Based Measures One way to determine whether there is a statistical relationship between two variables is to use the chi square test for independence of Chapter 10. A cross classification table is used to obtain the expected number of cases under the assumption of no relationship between the two variables.Then
Measures of Association 769 so that they reach a maximum numerical value of 1 when the two variables have a perfect relationship with each other. They are also scaled so that they have a value of 0 when there is no relationship between two variables. While there are exceptions to these rules, most measures of association are of this sort. Some measures of association are constructed to have a range of only 0 to 1, other measures have a range from -1 to +1. The latter provide a means of determining whether the two variables have a positive or negative association with each other. Tests of significance are also provided for many of the measures of association. These tests begin by hypothesizing that there is no relationship between the two variables, and that the measure of association equals 0. The researcher calculates the observed value of the measure of association, and if the measure is different enough from 0, the test shows that there is a significant relationship between the two variables. 11.1.2 Chapter Summary This chapter begins with measures of association based on the chi square statistic. It will be seen in Section 11.2 that the χ 2 statistic is a function not only of the size of the relationship between the two variables, but also of the sample size and the number of rows and columns in the table. This statistic can be adjusted in various ways, in order to produce a measure of association. Following this, in Section 11.3, a different approach to obtaining a measure of association is outlined. This is to consider how much the error of prediction for a variable can be reduced when the researcher has knowledge of a second variable. Section ?? examines various correlation coefficients, measures which summarize the relationship between two variables that have an ordinal or higher level of measurement. Finally, Section ?? presents the regression model for interval or ratio variables. The regression model allows the researcher to estimate the size of the relationship between two variables, where one variable is considered the independent variable, and the other variable depends on the first variable. 11.2 Chi Square Based Measures One way to determine whether there is a statistical relationship between two variables is to use the chi square test for independence of Chapter 10. A cross classification table is used to obtain the expected number of cases under the assumption of no relationship between the two variables. Then
Measures of Association 770 the value of the chi square statistic provides a test whether or not there is a statistical relationship between the variables in the cross classification table. While the chi square test is a very useful means of testing for a rela- tionship,it suffers from several weakenesses.One difficulty with the test is that it does not indicate the nature of the relationship.From the chi square statistic itself.it is not possible to determine the extent to which one vari- able changes,as values of the other variable change.About the only way to do this is to closely examine the table in order to determine the pattern of the relationship between the two variables. A second problem with the chi square test for independence is that the size of the chi square statistic may not provide a reliable guide to the strength of the statistical relationship between the two variables.When two different cross classification tables have the same sample size,the two variables in the table with the larger chi square value are more strongly related than are the two variables in the table with the smaller chi square value.But when the sample sizes for two tables differ.the size of the chi square statistic is a misleading indicator of the extent of the relationship between two variables. This will be seen in Example 11.2.1. A further difficulty is that the value of the chi square statistic may change depending on the number of cells in the table.For example,a table with 2 columns and 3 rows may give a different chi square value than does a cross classification table with 4 columns and 5 rows,even when the relationship between the two variables and the sample sizes are the same.The number of rows and columns in a table are referred to as the dimensions of the table.Tables of different dimension give different degrees of freedom,partly correcting for this problem.But it may still be misleading to compare the chi square statistic for two tables of quite different dimensions In order to solve some of these problems,the chi square statistic can be adjusted to take account of differences in sample size and dimension of the table.Some of the measures which can be calculated are phi,the contingency coefficient,and Cramer's V.Before examining these measures,the following example shows how sample size affects the value of the chi square statistic. Example 11.2.1 Effect of Sample Size on the Chi Square Statistic The hypothetical examples of Section 6.2 of Chapter 6 will be used to illustrate the effect of sample size on the value of the chi square statistic. The data from Tables 6.9 and 6.10 will first be used to illustrate how a larger
Measures of Association 770 the value of the chi square statistic provides a test whether or not there is a statistical relationship between the variables in the cross classification table. While the chi square test is a very useful means of testing for a relationship, it suffers from several weakenesses. One difficulty with the test is that it does not indicate the nature of the relationship. From the chi square statistic itself, it is not possible to determine the extent to which one variable changes, as values of the other variable change. About the only way to do this is to closely examine the table in order to determine the pattern of the relationship between the two variables. A second problem with the chi square test for independence is that the size of the chi square statistic may not provide a reliable guide to the strength of the statistical relationship between the two variables. When two different cross classification tables have the same sample size, the two variables in the table with the larger chi square value are more strongly related than are the two variables in the table with the smaller chi square value. But when the sample sizes for two tables differ, the size of the chi square statistic is a misleading indicator of the extent of the relationship between two variables. This will be seen in Example 11.2.1. A further difficulty is that the value of the chi square statistic may change depending on the number of cells in the table. For example, a table with 2 columns and 3 rows may give a different chi square value than does a cross classification table with 4 columns and 5 rows, even when the relationship between the two variables and the sample sizes are the same. The number of rows and columns in a table are referred to as the dimensions of the table. Tables of different dimension give different degrees of freedom, partly correcting for this problem. But it may still be misleading to compare the chi square statistic for two tables of quite different dimensions. In order to solve some of these problems, the chi square statistic can be adjusted to take account of differences in sample size and dimension of the table. Some of the measures which can be calculated are phi, the contingency coefficient, and Cramer’s V. Before examining these measures, the following example shows how sample size affects the value of the chi square statistic. Example 11.2.1 Effect of Sample Size on the Chi Square Statistic The hypothetical examples of Section 6.2 of Chapter 6 will be used to illustrate the effect of sample size on the value of the chi square statistic. The data from Tables 6.9 and 6.10 will first be used to illustrate how a larger