Field experiment

Field experiments are experiments carried out outside of laboratory settings.

They randomly assign subjects (or other sampling units) to either treatment or control groups to test claims of causal relationships. Random assignment helps establish the comparability of the treatment and control group so that any differences between them that emerge after the treatment has been administered plausibly reflect the influence of the treatment rather than pre-existing differences between the groups. The distinguishing characteristics of field experiments are that they are conducted in real-world settings and often unobtrusively and control not only the subject pool but selection and overtness, as defined by leaders such as John A. List. This is in contrast to laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. Field experiments have some contextual differences as well from naturally-occurring experiments and quasi-experiments.^[1] While naturally-occurring experiments rely on an external force (e.g. a government, nonprofit, etc.) controlling the randomization treatment assignment and implementation, field experiments require researchers to retain control over randomization and implementation. Quasi-experiments occur when treatments are administered as-if randomly (e.g. U.S. Congressional districts where candidates win with slim margins,^[2] weather patterns, natural disasters, etc.).

Field experiments encompass a broad array of experimental designs, each with varying degrees of generality. Some criteria of generality (e.g. authenticity of treatments, participants, contexts, and outcome measures) refer to the contextual similarities between the subjects in the experimental sample and the rest of the population. They are increasingly used in the social sciences to study the effects of policy-related interventions in domains such as health, education, crime, social welfare, and politics.

Characteristics

Under random assignment, outcomes of field experiments are reflective of the real-world because subjects are assigned to groups based on non-deterministic probabilities.^[3] Two other core assumptions underlie the ability of the researcher to collect unbiased potential outcomes: excludability and non-interference.^[4]^[5] The excludability assumption provides that the only relevant causal agent is through the receipt of the treatment. Asymmetries in assignment, administration or measurement of treatment and control groups violate this assumption. The non-interference assumption, or Stable Unit Treatment Value Assumption (SUTVA), indicates that the value of the outcome depends only on whether or not the subject is assigned the treatment and not whether or not other subjects are assigned to the treatment. When these three core assumptions are met, researchers are more likely to provide unbiased estimates through field experiments.

After designing the field experiment and gathering the data, researchers can use statistical inference tests to determine the size and strength of the intervention's effect on the subjects. Field experiments allow researchers to collect diverse amounts and types of data. For example, a researcher could design an experiment that uses pre- and post-trial information in an appropriate statistical inference method to see if an intervention has an effect on subject-level changes in outcomes.

Practical uses

Field experiments offer researchers a way to test theories and answer questions with higher external validity because they simulate real-world occurrences.^[6] Some researchers argue that field experiments are a better guard against potential bias and biased estimators. As well, field experiments can act as benchmarks for comparing observational data to experimental results. Using field experiments as benchmarks can help determine levels of bias in observational studies, and, since researchers often develop a hypothesis from an a priori judgment, benchmarks can help to add credibility to a study.^[7] While some argue that covariate adjustment or matching designs might work just as well in eliminating bias, field experiments can increase certainty^[8] by displacing omitted variable bias because they better allocate observed and unobserved factors.^[9]

Researchers can utilize machine learning methods to simulate, reweight, and generalize experimental data.^[10] This increases the speed and efficiency of gathering experimental results and reduces the costs of implementing the experiment. Another cutting-edge technique in field experiments is the use of the multi armed bandit design,^[11] including similar adaptive designs on experiments with variable outcomes and variable treatments over time.^[12]

Limitations

There are limitations of and arguments against using field experiments in place of other research designs (e.g. lab experiments, survey experiments, observational studies, etc.). Given that field experiments necessarily take place in a specific geographic and political setting, there is a concern about extrapolating outcomes to formulate a general theory regarding the population of interest. However, researchers have begun to find strategies to effectively generalize causal effects outside of the sample by comparing the environments of the treated population and external population, accessing information from larger sample size, and accounting and modeling for treatment effects heterogeneity within the sample.^[13] Others have used covariate blocking techniques to generalize from field experiment populations to external populations.^[14]

Noncompliance issues affecting field experiments (both one-sided and two-sided noncompliance)^[15]^[16] can occur when subjects who are assigned to a certain group never receive their assigned intervention. Other problems to data collection include attrition (where subjects who are treated do not provide outcome data) which, under certain conditions, will bias the collected data. These problems can lead to imprecise data analysis; however, researchers who use field experiments can use statistical methods in calculating useful information even when these difficulties occur.^[16]

Using field experiments can also lead to concerns over interference^[17] between subjects. When a treated subject or group affects the outcomes of the nontreated group (through conditions like displacement, communication, contagion etc.), nontreated groups might not have an outcome that is the true untreated outcome. A subset of interference is the spillover effect, which occurs when the treatment of treated groups has an effect on neighboring untreated groups.

Field experiments can be expensive, time-consuming to conduct, difficult to replicate, and plagued with ethical pitfalls. Subjects or populations might undermine the implementation process if there is a perception of unfairness in treatment selection (e.g. in 'negative income tax' experiments communities may lobby for their community to get a cash transfer so the assignment is not purely random). There are limitations to collecting consent forms from all subjects. Comrades administering interventions or collecting data could contaminate the randomization scheme. The resulting data, therefore, could be more varied: larger standard deviation, less precision and accuracy, etc. This leads to the use of larger sample sizes for field testing. However, others argue that, even though replicability is difficult, if the results of the experiment are important then there a larger chance that the experiment will get replicated. As well, field experiments can adopt a "stepped-wedge" design that will eventually give the entire sample access to the intervention on different timing schedules.^[18] Researchers can also design a blinded field experiment to remove possibilities of manipulation.

Examples

The history of experiments in the lab and the field has left longstanding impacts in the physical, natural, and life sciences. Modern use field experiments has roots in the 1700s, when James Lind utilized a controlled field experiment to identify a treatment for scurvy.^[19]

Other categorical examples of sciences that use field experiments include:

Economists have used field experiments to analyze discrimination (e.g., in the labor market,^[20]^[21] in housing,^[22] in the sharing economy,^[23] in the credit market,^[24] or in integration^[25]), health care programs,^[26] charitable fundraising,^[27] education,^[28] information aggregation in markets, and microfinance programs.^[29]
Engineers often conduct field tests of prototype products to validate earlier laboratory tests and to obtain broader feedback.
Social psychology has pioneering figures who utilized field experiments, including Kurt Lewin and Stanley Milgram.
Agricultural science researcher R.A. Fisher analyzed randomized actual "field" experimental data^[30] for crops.
Political Science researcher Harold Gosnell conducted an early field experiment on voter participation in 1924 and 1925.^[31]
Ecology Joseph H. Connell’s field experiment.^[32]

References

^ Meyer, B. D. (1995). "Natural and quasi-experiments in economics" (PDF). Journal of Business & Economic Statistics. 13 (2): 151–161. doi:10.2307/1392369. JSTOR 1392369.
^ Lee, D. S.; Moretti, E.; Butler, M. J. (2004). "Do voters affect or elect policies? Evidence from the US House". The Quarterly Journal of Economics. 119 (3): 807–859. doi:10.1162/0033553041502153. JSTOR 25098703.
^ Rubin, Donald B. (2005). "Causal Inference Using Potential Outcomes". Journal of the American Statistical Association. 100 (469): 322–331. doi:10.1198/016214504000001880. S2CID 842793.
^ Nyman, Pär (2017). "Door-to-door canvassing in the European elections: Evidence from a Swedish field experiment". Electoral Studies. 45: 110–118. doi:10.1016/j.electstud.2016.12.002.
^ Broockman, David E.; Kalla, Joshua L.; Sekhon, Jasjeet S. (2017). "The Design of Field Experiments with Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs". Political Analysis. 25 (4): 435–464. doi:10.1017/pan.2017.27. S2CID 233321039.
^ Duflo, Esther (2006). Field Experiments in Development Economics (Report). Massachusetts Institute of Technology.
^ Harrison, G. W.; List, J. A. (2004). "Field experiments". Journal of Economic Literature. 42 (4): 1009–1055. doi:10.1257/0022051043004577. JSTOR 3594915.
^ LaLonde, R. J. (1986). "Evaluating the econometric evaluations of training programs with experimental data". The American Economic Review. 76 (4): 604–620. JSTOR 1806062.
^ Gordon, Brett R.; Zettelmeyer, Florian; Bhargava, Neha; Chapsky, Dan (2017). "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook". Marketing Science. doi:10.2139/ssrn.3033144. S2CID 197733986.
^ Athey, Susan; Imbens, Guido (2016). "Recursive partitioning for heterogeneous causal effects: Table 1". Proceedings of the National Academy of Sciences. 113 (27): 7353–7360. doi:10.1073/pnas.1510489113. PMC 4941430. PMID 27382149.
^ Scott, Steven L. (2010). "A modern Bayesian look at the multi-armed bandit". Applied Stochastic Models in Business and Industry. 26 (6): 639–658. doi:10.1002/asmb.874.
^ Raj, V.; Kalyani, S. (2017). "Taming non-stationary bandits: A Bayesian approach". arXiv:1707.09727 [stat.ML].
^ Dehejia, R.; Pop-Eleches, C.; Samii, C. (2015). From local to global: External validity in a fertility natural experiment (PDF) (Report). National Bureau of Economic Research. w21459.
^ Egami, Naoki; Hartman, Erin (19 July 2018). "Covariate Selection for Generalizing Experimental Results" (PDF). Princeton.edu. Archived from the original (PDF) on 10 July 2020. Retrieved 31 December 2018.
^ Blackwell, Matthew (2017). "Instrumental Variable Methods for Conditional Effects and Causal Interaction in Voter Mobilization Experiments". Journal of the American Statistical Association. 112 (518): 590–599. doi:10.1080/01621459.2016.1246363. S2CID 55878137.
^ ^a ^b Aronow, Peter M.; Carnegie, Allison (2013). "Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable". Political Analysis. 21 (4): 492–506. doi:10.1093/pan/mpt013.
^ Aronow, P. M.; Samii, C. (2017). "Estimating average causal effects under general interference, with application to a social network experiment". The Annals of Applied Statistics. 11 (4): 1912–1947. arXiv:1305.6156. doi:10.1214/16-AOAS1005. S2CID 26963450.
^ Woertman, W.; de Hoop, E.; Moerbeek, M.; Zuidema, S. U.; Gerritsen, D. L.; Teerenstra, S. (2013). "Stepped wedge designs could reduce the required sample size in cluster randomized trials". Journal of Clinical Epidemiology. 66 (7): 752–758. doi:10.1016/j.jclinepi.2013.01.009. hdl:2066/117688. PMID 23523551.
^ Tröhler, U. (2005). "Lind and scurvy: 1747 to 1795". Journal of the Royal Society of Medicine. 98 (11): 519–522. doi:10.1177/014107680509801120. PMC 1276007. PMID 16260808.
^ Bertrand, Marianne; Mullainathan, Sendhil (2004). "Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination" (PDF). American Economic Review. 94 (4): 991–1013. doi:10.1257/0002828042002561.
^ Gneezy, Uri; List, John A (2006). "Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments" (PDF). Econometrica. 74 (5): 1365–1384. doi:10.1111/j.1468-0262.2006.00707.x.
^ Ahmed, Ali M; Hammarstedt, Mats (2008). "Discrimination in the rental housing market: A field experiment on the Internet". Journal of Urban Economics. 64 (2): 362–372. doi:10.1016/j.jue.2008.02.004.
^ Edelman, Benjamin; Luca, Michael; Svirsky, Dan (2017). "Racial discrimination in the sharing economy: Evidence from a field experiment". American Economic Journal: Applied Economics. 9 (2): 1–22. doi:10.1257/app.20160213.
^ Pager, Devah; Shepherd, Hana (2008). "The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets". Annual Review of Sociology. 34: 181–209. doi:10.1146/annurev.soc.33.040406.131740. PMC 2915460. PMID 20689680.
^ Nesseler, Cornel; Carlos, Gomez-Gonzalez; Dietl, Helmut (2019). "What's in a name? Measuring access to social activities with a field experiment". Palgrave Communications. 5: 1–7. doi:10.1057/s41599-019-0372-0. hdl:11250/2635691.
^ Ashraf, Nava; Berry, James; Shapiro, Jesse M (2010). "Can higher prices stimulate product use? Evidence from a field experiment in Zambia" (PDF). American Economic Review. 100 (5): 2383–2413. doi:10.1257/aer.100.5.2383. S2CID 6392533.
^ Karlan, Dean; List, John A (2007). "Does price matter in charitable giving? Evidence from a large-scale natural field experiment" (PDF). American Economic Review. 97 (5): 1774–1793. doi:10.1257/aer.97.5.1774. S2CID 10041821.
^ Fryer Jr, Roland G (2014). "Injecting charter school best practices into traditional public schools: Evidence from field experiments". The Quarterly Journal of Economics. 129 (3): 1355–1407. doi:10.1093/qje/qju011.
^ Field, Erica; Pande, Rohini (2008). "Repayment frequency and default in microfinance: evidence from India". Journal of the European Economic Association. 6 (2–3): 501–509. doi:10.1162/JEEA.2008.6.2-3.501.
^ Fisher, R.A. (1937). The Design of Experiments (PDF). Oliver and Boyd Ltd.
^ Gosnell, Harold F. (1926). "An Experiment in the Stimulation of Voting". American Political Science Review. 20 (4): 869–874. doi:10.1017/S0003055400110524.
^ Grodwohl, Jean-Baptiste; Porto, Franco; El-Hani, Charbel N. (2018-07-31). "The instability of field experiments: building an experimental research tradition on the rocky seashores (1950–1985)". History and Philosophy of the Life Sciences. 40 (3): 45. doi:10.1007/s40656-018-0209-y. ISSN 1742-6316. PMID 30066110. S2CID 51889466.

[1] Meyer, B. D. (1995). "Natural and quasi-experiments in economics" (PDF). Journal of Business & Economic Statistics. 13 (2): 151–161. doi:10.2307/1392369. JSTOR 1392369.

[2] Lee, D. S.; Moretti, E.; Butler, M. J. (2004). "Do voters affect or elect policies? Evidence from the US House". The Quarterly Journal of Economics. 119 (3): 807–859. doi:10.1162/0033553041502153. JSTOR 25098703.

[3] Rubin, Donald B. (2005). "Causal Inference Using Potential Outcomes". Journal of the American Statistical Association. 100 (469): 322–331. doi:10.1198/016214504000001880. S2CID 842793.

[4] Nyman, Pär (2017). "Door-to-door canvassing in the European elections: Evidence from a Swedish field experiment". Electoral Studies. 45: 110–118. doi:10.1016/j.electstud.2016.12.002.

[5] Broockman, David E.; Kalla, Joshua L.; Sekhon, Jasjeet S. (2017). "The Design of Field Experiments with Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs". Political Analysis. 25 (4): 435–464. doi:10.1017/pan.2017.27. S2CID 233321039.

[6] Duflo, Esther (2006). Field Experiments in Development Economics (Report). Massachusetts Institute of Technology.

[7] Harrison, G. W.; List, J. A. (2004). "Field experiments". Journal of Economic Literature. 42 (4): 1009–1055. doi:10.1257/0022051043004577. JSTOR 3594915.

[8] LaLonde, R. J. (1986). "Evaluating the econometric evaluations of training programs with experimental data". The American Economic Review. 76 (4): 604–620. JSTOR 1806062.

[9] Gordon, Brett R.; Zettelmeyer, Florian; Bhargava, Neha; Chapsky, Dan (2017). "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook". Marketing Science. doi:10.2139/ssrn.3033144. S2CID 197733986.

[10] Athey, Susan; Imbens, Guido (2016). "Recursive partitioning for heterogeneous causal effects: Table 1". Proceedings of the National Academy of Sciences. 113 (27): 7353–7360. doi:10.1073/pnas.1510489113. PMC 4941430. PMID 27382149.

[11] Scott, Steven L. (2010). "A modern Bayesian look at the multi-armed bandit". Applied Stochastic Models in Business and Industry. 26 (6): 639–658. doi:10.1002/asmb.874.

[12] Raj, V.; Kalyani, S. (2017). "Taming non-stationary bandits: A Bayesian approach". arXiv:1707.09727 [stat.ML].

[13] Dehejia, R.; Pop-Eleches, C.; Samii, C. (2015). From local to global: External validity in a fertility natural experiment (PDF) (Report). National Bureau of Economic Research. w21459.

[14] Egami, Naoki; Hartman, Erin (19 July 2018). "Covariate Selection for Generalizing Experimental Results" (PDF). Princeton.edu. Archived from the original (PDF) on 10 July 2020. Retrieved 31 December 2018.

[15] Blackwell, Matthew (2017). "Instrumental Variable Methods for Conditional Effects and Causal Interaction in Voter Mobilization Experiments". Journal of the American Statistical Association. 112 (518): 590–599. doi:10.1080/01621459.2016.1246363. S2CID 55878137.

[Aronow_2013-16] Aronow, Peter M.; Carnegie, Allison (2013). "Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable". Political Analysis. 21 (4): 492–506. doi:10.1093/pan/mpt013.

[17] Aronow, P. M.; Samii, C. (2017). "Estimating average causal effects under general interference, with application to a social network experiment". The Annals of Applied Statistics. 11 (4): 1912–1947. arXiv:1305.6156. doi:10.1214/16-AOAS1005. S2CID 26963450.

[18] Woertman, W.; de Hoop, E.; Moerbeek, M.; Zuidema, S. U.; Gerritsen, D. L.; Teerenstra, S. (2013). "Stepped wedge designs could reduce the required sample size in cluster randomized trials". Journal of Clinical Epidemiology. 66 (7): 752–758. doi:10.1016/j.jclinepi.2013.01.009. hdl:2066/117688. PMID 23523551.

[19] Tröhler, U. (2005). "Lind and scurvy: 1747 to 1795". Journal of the Royal Society of Medicine. 98 (11): 519–522. doi:10.1177/014107680509801120. PMC 1276007. PMID 16260808.

[20] Bertrand, Marianne; Mullainathan, Sendhil (2004). "Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination" (PDF). American Economic Review. 94 (4): 991–1013. doi:10.1257/0002828042002561.

[21] Gneezy, Uri; List, John A (2006). "Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments" (PDF). Econometrica. 74 (5): 1365–1384. doi:10.1111/j.1468-0262.2006.00707.x.

[22] Ahmed, Ali M; Hammarstedt, Mats (2008). "Discrimination in the rental housing market: A field experiment on the Internet". Journal of Urban Economics. 64 (2): 362–372. doi:10.1016/j.jue.2008.02.004.

[23] Edelman, Benjamin; Luca, Michael; Svirsky, Dan (2017). "Racial discrimination in the sharing economy: Evidence from a field experiment". American Economic Journal: Applied Economics. 9 (2): 1–22. doi:10.1257/app.20160213.

[24] Pager, Devah; Shepherd, Hana (2008). "The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets". Annual Review of Sociology. 34: 181–209. doi:10.1146/annurev.soc.33.040406.131740. PMC 2915460. PMID 20689680.

[25] Nesseler, Cornel; Carlos, Gomez-Gonzalez; Dietl, Helmut (2019). "What's in a name? Measuring access to social activities with a field experiment". Palgrave Communications. 5: 1–7. doi:10.1057/s41599-019-0372-0. hdl:11250/2635691.

[26] Ashraf, Nava; Berry, James; Shapiro, Jesse M (2010). "Can higher prices stimulate product use? Evidence from a field experiment in Zambia" (PDF). American Economic Review. 100 (5): 2383–2413. doi:10.1257/aer.100.5.2383. S2CID 6392533.

[27] Karlan, Dean; List, John A (2007). "Does price matter in charitable giving? Evidence from a large-scale natural field experiment" (PDF). American Economic Review. 97 (5): 1774–1793. doi:10.1257/aer.97.5.1774. S2CID 10041821.

[28] Fryer Jr, Roland G (2014). "Injecting charter school best practices into traditional public schools: Evidence from field experiments". The Quarterly Journal of Economics. 129 (3): 1355–1407. doi:10.1093/qje/qju011.

[29] Field, Erica; Pande, Rohini (2008). "Repayment frequency and default in microfinance: evidence from India". Journal of the European Economic Association. 6 (2–3): 501–509. doi:10.1162/JEEA.2008.6.2-3.501.

[30] Fisher, R.A. (1937). The Design of Experiments (PDF). Oliver and Boyd Ltd.

[31] Gosnell, Harold F. (1926). "An Experiment in the Stimulation of Voting". American Political Science Review. 20 (4): 869–874. doi:10.1017/S0003055400110524.

[32] Grodwohl, Jean-Baptiste; Porto, Franco; El-Hani, Charbel N. (2018-07-31). "The instability of field experiments: building an experimental research tradition on the rocky seashores (1950–1985)". History and Philosophy of the Life Sciences. 40 (3): 45. doi:10.1007/s40656-018-0209-y. ISSN 1742-6316. PMID 30066110. S2CID 51889466.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]