閤 Are Emily and greg more employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination OR。 Marianne Bertrand: Sendhil Mullainathan The American Economic Review, Vol 94, No 4.(Sep, 2004), pp. 991-1013 Stable url: ttp: //inks. istor org/sici?sici=0002-8282%28200409%02994%3A4%3C991%3AAEAGME%3E2.0. C0%03B2-H The American Economic Review is currently published by American EconomIc Association Your use of the jStoR archive indicates your acceptance of jSTOR's Terms and Conditions of Use, available at http:/lwww.istororg/about/terms.htmlJstOr'sTermsandConditionsofUseprovidesinpartthatunlessyouhaveobtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JStOR archive only for your personal, non-commercial use Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact support(@jstor.org http://www.jstor.org Tue may1510:59:102007
Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination Marianne Bertrand; Sendhil Mullainathan The American Economic Review, Vol. 94, No. 4. (Sep., 2004), pp. 991-1013. Stable URL: http://links.jstor.org/sici?sici=0002-8282%28200409%2994%3A4%3C991%3AAEAGME%3E2.0.CO%3B2-H The American Economic Review is currently published by American Economic Association. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/aea.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact support@jstor.org. http://www.jstor.org Tue May 15 10:59:10 2007
Are Emily and Greg More Employable Than Lakisha and Jamal? a Field Experiment on Labor Market Discrimination By MARIANNE BERTRAND AND SENDHIL MULLAINATHAN* We study race in the labor market by sending fictitious resumes to help-wanted ads in Boston and Chicago newspapers. To manipulate perceived race, resumes are randomly assigned African-American- or White-sounding names. White receive 50 percent more callbacks for interviews. Callbacks are also more re sive to resume quality for Whi es than for African-American ones. The gap is uniform across occupation, industry, and employer size. We also find little evidence that employers are inferring social class from the names. Differential treatment by race still appears to still be prominent in the U.S. labor market. JEL J71,J64) Every measure of economic success reveals dates, employers might favor the African- significant racial inequality in the U.S. labor American one. Data limitations make it market. Compared to Whites, African-Ameri- difficult to empirically test these views. Since cans are twice as likely to be unemployed and researchers possess far less data than employers earn nearly 25 percent less when they are em- do, White and African-American workers that ployed( Council of Economic Advisers, 1998) ear similar to researchers may look very This inequality has sparked a debate ferent to employers. So any racial difference whether employers treat members of different in labor market outcomes could just as easily be races differentially. When faced with observ- attributed to differences that are observable to ably similar African-American and White ap- employers but unobservable to researchers plicants, do they favor the White one? Some To circumvent this difficulty, we conduct a argue yes, citing either employer prejudice or field experiment that builds on the correspon- employer perception that race signals lower pro- dence testing methodology that has been pri ductivity. Others argue that differential treat- marily used in the past to study minority ment by race is a relic of the past, eliminated by outcomes in the United Kingdom. We send some combination of employer enlil chemel resumes in response to help-wanted ads in Chi affirmative action programs and the profit cago and Boston newspapers and measure call maximization motive. In fact, many in this latter back for interview for each sent resume. We amp even feel that stringent enforcement of affirmative action programs has produced an environment of reverse discrimination. They Th ften explains the or performance of would argue that faced with identical candi- African-Americans in terms of supply factors. If African Americans lack many basic skills entering the labor market, then they will perform worse, even with parity or favoritism Chicago, 1101 E. 58th Street, RO 229D, Chicago, IL 6063 Roger Jowell and Patricia Prescott-Clarke(1970). NBER, and CEPR (e-mail: marianne bertrand@ gsb. Jim Hubbuck and Simon Carter(1980), Colin Bi hicago.edu): Mullainathan: Department of Econo Pat Gay(1985), and Peter A Riach and Judith Rich massachusetts Institute of Technology, 50 Memorial Driv One caveat is that some of these studies fail to fully match MA 02142, and NBER (e skills between mi arkowitz, Hong Chung, Almudena Fernandez, Mary Anne racial origin. Doris Weichselbaumer (2003, 2004)studies cha Maheswari, Beverley the artis, Alison Tisza, grant Whitehorn, and Christine Yee ard E nisbett Cohen(1996) provided excellent research assistance. We are also grateful experiment to study how employers to numerous colleagues and seminar participants for very past varies between the North and the p response to criminal helpful comments
Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination We study race in the labor market by sending fictitious resumes to help-wanted ads in Boston and Chicago newspapers. To manipulate perceived race, resumes are randomly assigned African-American- or White-sounding names. White names receive 50 percent more callbacks for interviews. Callbacks are also more responsive to resume quality for White names than for African-American ones. The racial gap is uniform across occupation, industry, and employer size. We also find little evidence that employers are inferring social class from the names. Differential treatment by race still appears to still be prominent in the U.S. labor market. (JEL 571, J64). Every measure of economic success reveals significant racial inequality in the U.S. labor market. Compared to Whites, African-Americans are twice as likely to be unemployed and earn nearly 25 percent less when they are employed (Council of Economic Advisers, 1998). This inequality has sparked a debate as to whether employers treat members of different races differentially. When faced with observably similar African-American and White applicants, do they favor the White one? Some argue yes, citing either employer prejudice or employer perception that race signals lower productivity. Others argue that differential treatment by race is a relic of the past, eliminated by some combination of employer enlightenment, affirmative action programs and the profitmaximization motive. In fact, many in this latter camp even feel that stringent enforcement of affirmative action programs has produced an environment of reverse discrimination. They would argue that faced with identical candi- * Bertrand: Graduate School of Business, University of Chicago, 1101 E. 58th Street, RO 229D, Chicago, IL 60637, NBER, and CEPR (e-mail: marianne. bertrand @gsb. uchicago.edu); Mullainathan: Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, E52-380a, Cambridge, MA 02142, and NBER (e-mail: mullain@mit.edu). David Abrams, Victoria Bede, Simone Berkowitz, Hong Chung, Almudena Femandez, Mary Anne Guediguian, Christine Jaw, Richa Maheswari, Beverley Martis, Alison Tisza, Grant Whitehorn, and Christine Yee provided excellent research assistance. We are also grateful to numerous colleagues and seminar participants for very helpful comments. dates, employers might favor the AfricanAmerican one.' Data limitations make it difficult to empirically test these views. Since researchers possess far less data than employers do, White and African-American workers that appear similar to researchers may look very different to employers. So any racial difference in labor market outcomes could just as easily be attributed to differences that are observable to employers but unobservable to researchers. To circumvent this difficulty, we conduct a field experiment that builds on the correspondence testing methodology that has been primarily used in the past to stud minority outcomes in the United Kingdom! We send resumes in response to help-wanted ads in Chicago and Boston newspapers and measure callback for interview for each sent resume. We ' This camp often explains the poor performance of African-Americans in terms of supply factors. If AfricanAmericans lack many basic skills entering the labor market, then they will perform worse, even with parity or favoritism in hiring. See Roger Jowell and Patricia Prescott-Clarke (1970), Jim Hubbuck and Simon Carter (1980), Colin Brown and Pat Gay (1985), and Peter A. Riach and Judith Rich (1991). One caveat is that some of these studies fail to fully match skills between minority and nonminority resumes. For example some impose differential education background by racial origin. Doris Weichselbaumer (2003, 2004) studies the impact of sex-stereotypes and sexual orientation. Richard E. Nisbett and Dov Cohen (1996) perform a related field experiment to study how employers' response to a criminal past varies between the North and the South in the United States
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 experimentally manipulate perception of race Whites with lower-quality resumes. On the via the name of the fictitious job applicant. We other hand, having a higher-quality resume has randomly assign very White-sounding names a smaller effect for African-Americans. In other (such as Emily Walsh or Greg Baker)to half the words, the gap between Whites and African- resumes and very African-American-sounding Americans widens with resume quality. while names(such as Lakisha Washington or Jamal one may have expected improved credentials to Jones) to the other half. Because we are also alleviate employers'fear that African-American gap in callback, we experimentally vary the skills, this is not the case in our date servable interested in how credentials affect the racial applicants are deficient in some unob given ad Higher-quality applicants have on ay. 4p The experiment also reveals several other quality of the resumes used in response to a spects of the differential treatment by race erage a little more labor market experience and First, since we randomly assign applicant fewer holes in their employment history; they postal addresses to the resumes, we can stud are also more likely to have an e-mail address, the effect of neighborhood of residence on the have completed some certification degree, pos- likelihood of callback. We find that living in a sess foreign langua 3 In practice, we typically borhood increases callback rates. But, interest ge skills, or have been wealthier(or more educated or Whiter) neigh awarded some honor send four resumes in response to each ad: two ingly, African-Americans are not helped more higher-quality and two lower-quality ones. than Whites by living in a"better"neighbor- We randomly assign to one of the higher- and hood. Second, the racial gap we measure one of the lower-quality resumes an African- different industries does not appear correlated to American-sounding name. In total, we respond Census-based measures of the racial gap in to over 1, 300 employment ads in the sales, wages. The same is true for the racial gap we administrative support, clerical, and customer measure in different occupations. In fact, we services job categories and send nearly 5,000 find that the racial gaps in callback are statisti resumes. The ads we respond to cover a large cally indistinguishable across all the occupation spectrum of job quality, from cashier work at and industry categories covered in the experi retail establishments and clerical work in a mail ment. Federal contractors, who are thought to be room, to office and sales management positions. more severely constrained by affirmative action We find large racial differences in callback laws, do not treat the African-American re- rates Applicants with White names need to sumes more preferentially; neither do larger em- send about 10 resumes to get one callback ployers or employers who explicitly state that whereas applicants with African-American they are"Equal Opportunity Employers. In names need to send about 15 resumes. This Chicago, we find a slightly smaller racial ga 50-percent gap in callback is statistically signi when employers are located in more African- icant. a White name yields as many more call- American neighborhoods backs as an additional eight years of experience The rest of the paper is organized as follows on a resume. Since applicants names are ran- Section I compar domly assigned, this gap can only be attributed work on racial discrimination, and most nota- to the name manipulation bly to the labor market audit studies. We Race also affects the reward to having a bet- describe the experimental design in Section ter resume. Whites with higher-quality resumes II and present the results in Section Ill, subsec receive nearly 30-percent more callbacks than tion A In Section IV, we discuss possible in- terpretations of our results, focusing especiall on two issues First we examine whether the In creating the higher-quality resumes, we deliberate 6 These results contrast with the view, mostly ered in this experiment higher returns to skills. For example, estimating however, these effects are about the several decades of census d We briefly discuss below eckman et al. (2001) show that African- discussion is experience higher returns to a high school degree than offered in Section iv. subsection B Whites do
992 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 experimentally manipulate perception of race via the name of the fictitious job applicant. We randomly assign very White-sounding names (such as Emily Walsh or Greg Baker) to half the resumes and very African-American-sounding names (such as Lakisha Washington or Jamal Jones) to the other half. Because we are also interested in how credentials affect the racial gap in callback, we experimentally vary the quality of the resumes used in response to a given ad. Higher-quality applicants have on average a little more labor market experience and fewer holes in their employment history; they are also more likely to have an e-mail address, have completed some certification degree, possess foreign language skills, or have been awarded some honor^.^ In practice, we typically send four resumes in response to each ad: two higher-quality and two lower-quality ones. We randomly assign to one of the higher- and one of the lower-quality resumes an AfricanAmerican-sounding name. In total, we respond to over 1,300 employment ads in the sales, administrative support, clerical, and customer services job categories and send nearly 5,000 resumes. The ads we respond to cover a large spectrum of job quality, from cashier work at retail establishments and clerical work in a mail room, to office and sales management positions. We find large racial differences in callback rates.4 Applicants with White names need to send about 10 resumes to get one callback whereas applicants with African-American names need to send about 15 resumes. This 50-percent gap in callback is statistically significant. A White name yields as many more callbacks as an additional eight years of experience on a resume. Since applicants' names are randomly assigned, this gap can only be attributed to the name manipulation. Race also affects the reward to having a better resume. Whites with higher-quality resumes receive nearly 30-percent more callbacks than In creating the higher-quality resumes, we deliberately make small changes in credentials so as to minimize the risk of overqualification. For ease of exposition, we refer to the effects uncovered in this experiment as racial differences. Technically, however, these effects are about the racial soundingness of names. We briefly discuss below the potential confounds between name and race. A more extensive discussion is offered in Section IV, subsection B. Whites with lower-quality resumes. On the other hand, having a higher-quality resume has a smaller effect for African-Americans. In other words, the gap between Whites and AfricanAmericans widens with resume quality. While one may have expected improved credentials to alleviate employers' fear that African-American applicants are deficient in some unobservable skills, this is not the case in our data.5 The experiment also reveals several other aspects of the differential treatment by race. First, since we randomly assign applicants' postal addresses to the resumes, we can study the effect of neighborhood of residence on the likelihood of callback. We find that living in a wealthier (or more educated or Whiter) neighborhood increases callback rates. But, interestingly, African-Americans are not helped more than Whites by living in a "better" neighborhood. Second, the racial gap we measure in different industries does not appear correlated to Census-based measures of the racial gap in wages. The same is true for the racial gap we measure in different occupations. In fact, we find that the racial gaps in callback are statistically indistinguishable across all the occupation and industry categories covered in the experiment. Federal contractors, who are thought to be more severely constrained by affirmative action laws, do not treat the African-American resumes more preferentially; neither do larger employers or employers who explicitly state that they are "Equal Opportunity Employers." In Chicago, we find a slightly smaller racial gap when employers are located in more AfricanAmerican neighborhoods. The rest of the paper is organized as follows. Section I compares this experiment to earlier work on racial discrimination, and most notably to the labor market audit studies. We describe the experimental design in Section I1 and present the results in Section 111, subsection A. In Section IV, we discuss possible interpretations of our results, focusing especially on two issues. First, we examine whether the These results contrast with the view, mostly based on nonexperimental evidence, that African-Americans receive higher returns to skills. For example, estimating earnings regressions on several decades of Census data, James J. Heckman et al. (2001) show that African-Americans experience higher returns to a high school degree than Whites do
VOL 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET race-specific names we have chosen might also Rouse(2000), for example, examine the effect proxy for social class above and beyond the race of blind auditioning on the hiring process of of the applicant. Using birth certificate data on orchestras. By observing the treatment of fe- mothers education for the different first names male candidates before and after the introdu used in our sample, we find little relationship tion of blind auditions, they try to measure the between social background and the name- amount of sex discrimination. when such pseu specific callback rates. Second, we discuss how do-experiments can be found, the resulting our results map back to the different models of study can be very informative; but finding such discrimination proposed in the economics liter- experiments has proven to be extremely ature. In doing so, we focus on two important challenging results: the lower returns to credentials for a different set of studies, known as audit African-Americans and the relative homogene- studies, attempts to place comparable minority ity of the racial gap across occupations and and White actors into actual social and eco- industries. We conclude that existing models do nomic settings and measure how each group a poor job of explaining the full set of findings. fares in these settings. Labor market audit Section V concludes studies send comparable minority(African- American or Hispanic) and White auditors in I. Previous Research for interviews and measure whether one is more likely to get the job than the other. while the With conventional labor force and household results vary somewhat across studies, minority surveys, it is difficult to study whether differ- auditors tend to perform worse on average: they ential treatment occurs in the labor market. are less likely to get called back for a second Armed only with survey data, researchers usu- interview and, conditional on getting called ally measure differential treatment by compar- back, less likely to get hired ing the labor market performance of Whites and These audit studies provide some of the African-Americans (or men and women) for cleanest nonlaboratory evidence of differential which they observe similar sets of skills. But treatment by race. But they also have weak such comparisons can be quite misleading. nesses, most of which have been highlighted in Standard labor force surveys do not contain all Heckman and Siegelman( 1992)and Heckman the characteristics that employers observe when (1998). First, these studies require that both hiring, promoting, or setting wages. So one can members of the auditor pair are identical in all never be sure that the minority and nonminority dimensions that might affect productivity in workers being compared are truly similar from employers eyes, except for race. To accomplish the employers'perspective. As a consequence, this, researchers typically match auditors on any measured differences in outcomes could be several characteristics(height, weight, age, di attributed to these unobserved (to the re- alect, dressing style, hairdo) and train them for several days to coordinate interviewing styles This difficulty with conventional data has Yet, critics note that this is unlikely to erase the led some authors to instead rely on pseudo- numerous differences that exist between the au experiments. Claudia Goldin and Cecilia ditors in a pair Another weakness of the audit studies is that they are not double-blind. Auditors know the e 6 We also argue that a social class interpretation would purpose of the study. As Turner et al.(1991) ndings, such as why living in a better neighborhood does not increase callback rates ican- American names than for white names. Michael Fix and Marjery A. Turner(1998)provide a See Joseph G, Altonji and Rebecca M. Blank(1999) survey of many such audit studies (1978)and Shelby J. Mclntyre et al. (1980). Three more B William A. Darity, Jr. and Patrick L. escribe an interesting nonexperimental study. Prior to the and Steve w. DelCastillo (1991), and Turner et al.(1991) Civil Rights Act of 1964, ial biases, providing a direct measure of differential and Altonji and Blank (1999)summarize these studies. See treatment, Of course, as ( 998)mentions, discrin also David Neumark(1996)for a labor market audit stud nation was at that time too evident for detection on gender discrimination
VOL. 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET 993 race-specific names we have chosen might also proxy for social class above and beyond the race of the applicant. Using birth certificate data on mother's education for the different first names used in our sample, we find little relationship between social back round and the namespecific callback rates. WSecond, we discuss how our results map back to the different models of discrimination proposed in the economics literature. In doing so, we focus on two important results: the lower returns to credentials for African-Americans and the relative homogeneity of the racial gap across occupations and industries. We conclude that existing models do a poor job of explaining the full set of findings. Section V concludes. 1. Previous Research With conventional labor force and household surveys, it is difficult to study whether differential treatment occurs in the labor market.7 Armed only with survey data, researchers usually measure differential treatment by comparing the labor market performance of Whites and African-Americans (or men and women) for which they observe similar sets of skills. But such comparisons can be quite misleading. Standard labor force surveys do not contain all the characteristics that employers observe when hiring, promoting, or setting wages. So one can never be sure that the minority and nonminority workers being compared are truly similar from the employers' perspective. As a consequence, any measured differences in outcomes could be attributed to these unobserved (to the researcher) factors. This difficulty with conventional data has led some authors to instead rely on pseudoexperiments.* Claudia Goldin and Cecilia We also argue that a social class interpretation would find it hard to explain some of our findings, such as why living in a better neighborhood does not increase callback rates more for African-American names than for White names. 'See Joseph G. Altonji and Rebecca M. Blank (1999) for a detailed review of the existing literature on racial discrimination in the labor market. William A. Darity, Jr. and Patrick L. Mason (1998) describe an interesting nonexperimental study. Prior to the Civil Rights Act of 1964, employment ads would explicitly state racial biases, providing a direct measure of differential treatment. Of course, as Arrow (1998) mentions, discrimination was at that time "a fact too evident for detection." Rouse (2000), for example, examine the effect of blind auditioning on the hiring process of orchestras. By observing the treatment of female candidates before and after the introduction of blind auditions, they try to measure the amount of sex discrimination. When such pseudo-experiments can be found, the resulting study can be very informative; but finding such experiments has proven to be extremely challenging. A different set of studies, known as audit studies, attempts to place comparable minority and White actors into actual social and economic settings and measure how each group fares in these settings9 Labor market audit studies send comparable minority (AfricanAmerican or Hispanic) and White auditors in for interviews and measure whether one is more likely to get the job than the other.'' While the results vary somewhat across studies, minority auditors tend to perform worse on average: they are less likely to get called back for a second interview and, conditional on getting called back, less likely to get hired. These audit studies provide some of the cleanest nonlaboratory evidence of differential treatment by race. But they also have weaknesses, most of which have been highlighted in Heckman and Siegelman (1992) and Heckman (1998). First, these studies require that both members of the auditor pair are identical in all dimensions that might affect productivity in employers' eyes, except for race. To accomplish this, researchers typically match auditors on several characteristics (height, weight, age, dialect, dressing style, hairdo) and train them for several days to coordinate interviewing styles. Yet, critics note that this is unlikely to erase the numerous differences that exist between the auditors in a pair. Another weakness of the audit studies is that they are not double-blind. Auditors know the purpose of the study. As Turner et al. (1991) Michael Fix and Marjery A. Turner (1998) provide a survey of many such audit studies. lo Earlier hiring audit studies include Jerry M. Newman (1978) and Shelby J. McIntyre et al. (1980). Three more recent studies are Harry Cross et al. (1990), Franklin James and Steve W. DelCastillo (19911, and Turner et al. (1991). Heckman and Peter Siegelman (1992), Heckman (1998), and Altonji and Blank (1999) summarize these studies. See also David Neumark (1996) for a labor market audit study on gender discrimination
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 note:"The first day of training also included an examine the nature of the differential treatment introduction to employment discrimination, from many more angles equal employment opportunity, and a review of project design and methodology. This may I. Experimental I generate conscious or subconscious motives among auditors to generate data consistent or A. Creating a Bank of Resu inconsistent with their beliefs about race issues in America. As psychologists know very well, The first step of the experimental design is to these demand effects can be quite strong. It is generate templates for the resumes to be sent very difficult to insure that auditors will not The challenge is to produce a set of realistic and yant to do"a good job. " Since they know the representative resumes without using resumes goal of the experiment, they can alter their that belong to actual job seekers. To achieve behavior in front of employers to express (indi- this goal, we start with resumes of actual job rectly)their own views. Even a small belief by searchers but alter them sufficiently to create ently can result in measured differences in treat- structure and realism of the initial resumes with ment. This effect is further magnified by the fact out compromising their owners. that auditors are not in fact seeking jobs and are We begin with resumes posted on two job therefore more free to let their beliefs affect the search Web sites as the basis for our artificial sumes. While the resumes posted on these Finally, audit studies are extremely expen- Web sites may not be completely representative sive, making it difficult to generate large of the average job seeker, they provide a prac- enough samples to understand nuances and pos- tical approximation 12 We restrict ourselves to sible mitigating factors. Also, these budgetary people seeking employment in our experimental constraints worsen the problem of mismatched cities(Boston and Chicago). We also restrict auditor pairs. Cost considerations force the use ourselves to four occupational categories: sales of a limited number of pairs of auditors, mean- administrative support, clerical services, and ing that any one mismatched pair can easily customer services. Finally, we further restrict drive the results. In fact, these studies generally ourselves to resumes posted more than six tend to find significant differences in outcomes months prior to the start of the experiment. We across paIr purge the selected resumes of the person's name Our study circumvents these problems. First, and contact information because we only rely on resumes and not peo- During this process, we classify the resumes ple, we can be sure to generate comparability within each detailed occupational category into across race. In fact, since race is randomly as- two groups: high and low quality. In judging signed to each resume, the same resume will resume quality, we use criteria such as labor sometimes be associated with an African- market experience, career profile, existence of American name and sometimes with a White gaps in employment, and skills listed. Such a name. This guarantees that any differences we classification is admittedly subjective but it is find are caused solely by the race manipulation. made independently of any race assignment on econd, the use of paper resumes insulates us the resumes(which occurs later in the experi from demand effects. While the research assis- mental design). To further reinforce the quality tants know the purpose of the study, our proto- gap between the two sets of resumes, we add to col allows little room for conscious or each high-quality resume a subset of the follow subconscious deviations from the set proce- ing features: summer or while-at-school em- dures. Moreover, we can objectively measure ployment experience, volunteering experience, whether the randomization occurred as ex- extra computer skills, certification degrees, for pected. This kind of objective measurement is eign language skills, honors, or some military impossible in the of the previous audit ginal cost, we can send out a large number of rewww.careerbuilder.comandwww resumes. Besides giving us more precise esti ve found large variation in skill levels mates, this larger sample size also allows us to among people posting their resumes on these sites
994 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 note: "The first day of training also included an introduction to employment discrimination, equal employment opportunity, and a review of project design and methodology." This may generate conscious or subconscious motives among auditors to generate data consistent or inconsistent with their beliefs about race issues in America. As psychologists know very well, these demand effects can be quite strong. It is very difficult to insure that auditors will not want to do "a good job." Since they know the goal of the experiment, they can alter their behavior in front of employers to express (indirectly) their own views. Even a small belief by auditors that employers treat minorities differently can result in measured differences in treatment. This effect is further magnified by the fact that auditors are not in fact seeking jobs and are therefore more free to let their beliefs affect the interview process. Finally, audit studies are extremely expensive, making it difficult to generate large enough samples to understand nuances and possible mitigating factors. Also, these budgetary constraints worsen the problem of mismatched auditor pairs. Cost considerations force the use of a limited number of pairs of auditors, meaning that any one mismatched pair can easily drive the results. In fact, these studies generally tend to find significant differences in outcomes across pairs. Our study circumvents these problems. First, because we only rely on resumes and not people, we can be sure to generate comparability across race. In fact, since race is randomly assigned to each resume, the same resume will sometimes be associated with an AfricanAmerican name and sometimes with a White name. This guarantees that any differences we find are caused solely by the race manipulation. Second, the use of paper resumes insulates us from demand effects. While the research assistants know the purpose of the study, our protocol allows little room for conscious or subconscious deviations from the set procedures. Moreover, we can objectively measure whether the randomization occurred as expected. This kind of objective measurement is impossible in the case of the previous audit studies. Finally, because of relatively low marginal cost, we can send out a large number of resumes. Besides giving us more precise estimates, this larger sample size also allows us to examine the nature of the differential treatment from many more angles. 11. Experimental Design A. Creating a Bank of Resumes The first step of the experimental design is to generate templates for the resumes to be sent. The challenge is to produce a set of realistic and representative resumes without using resumes that belong to actual job seekers. To achieve this goal, we start with resumes of actual job searchers but alter them sufficiently to create distinct resumes. The alterations maintain the structure and realism of the initial resumes without compromising their owners. We begin with resumes posted on two job search Web sites as the basis for our artificial resumes." While the resumes posted on these Web sites may not be completely representative of the average job seeker, they provide a practical approximation.12 We restrict ourselves to people seeking employment in our experimental cities (Boston and Chicago). We also restrict ourselves to four occupational categories: sales, administrative support, clerical services, and customer services. Finally, we further restrict ourselves to resumes posted more than six months prior to the start of the experiment. We purge the selected resumes of the person's name and contact information. During this process, we classify the resumes within each detailed occupational category into two groups: high and low quality. In judging resume quality, we use criteria such as labor market experience, career profile, existence of gaps in employment, and skills listed. Such a classification is admittedly subjective but it is made independently of any race assignment on the resumes (which occurs later in the experimental design). To further reinforce the quality gap between the two sets of resumes, we add to each high-quality resume a subset of the following features: summer or while-at-school employment experience, volunteering experience, extra computer skills, certification degrees, foreign language skills, honors, or some military "The sites are www.careerbuilder.com and www. americasjobbank.com. l2 In practice, we found large variation in skill levels among people posting their resumes on these sites