Field experiments, like lab experiments, randomly assign subjects (or other sampling units) to either treatment or control groups in order to test claims of causal relationships. Random assignment helps establish the comparability of the treatment and control group, so that any differences between them that emerge after the treatment has been administered plausibly reflect the influence of the treatment rather than pre-existing differences between the groups. The distinguishing characteristics of field experiments are that they are conducted real-world settings and often unobtrusively. This is in contrast to laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. Field experiments have some contextual differences as well from naturally-occurring experiments and quasi-experiments. While naturally-occurring experiments rely on an external force (e.g. a government, nonprofit, etc.) controlling the randomization treatment assignment and implementation, field experiments require researchers to retain control over randomization and implementation. Quasi-experiments occur when treatments are administered as-if randomly (e.g. U.S. Congressional districts where candidates win with slim-margins, weather patterns, natural disasters, etc.).
Field experiments encompass a broad array of experimental designs, each with varying degrees of generality. Some criteria of generality (e.g. authenticity of treatments, participants, contexts, and outcome measures) refer to the contextual similarities between the subjects in the experimental sample and the rest of the population. They are increasingly used in the social sciences to study the effects of policy-related interventions in domains such as health, education, crime, social welfare, and politics.
Under random assignment, outcomes of field experiments are reflective of the real-world because subjects are assigned to groups based on non-deterministic probabilities. Two other core assumptions underly the ability of the researcher to collect unbiased potential outcomes: excludability and non-interference. The excludability assumption provides that the only relevant causal agent is through the receipt of the treatment. Asymmetries in assignment, administration or measurement of treatment and control groups violate this assumption. The non-interference assumption, or Stable Unit Treatment Value Assumption (SUTVA), indicates that the value of the outcome depends only on whether or not the subject is assigned the treatment and not whether or not other subjects are assigned to the treatment. When these three core assumptions are met, researchers are more likely to provide unbiased estimates through field experiments.
After designing the field experiment and gathering the data, researchers can use statistical inference tests to determine the size and strength of the intervention's effect on the subjects. Field experiments allow researchers to collect diverse amounts and types of data. For example, a researcher could design an experiment that uses pre- and post-trial information in an appropriate statistical inference method to see if an intervention has an effect on subject-level changes in outcomes.
Field experiments offer researchers a way to test theories and answer questions with higher external validity because they simulate real-world occurrences. Some researchers argue that field experiments are a better guard against potential bias and biased estimators. As well, field experiments can act as benchmarks for comparing observational data to experimental results. Using field experiments as benchmarks can help determine levels of bias in observational studies, and, since researchers often develop a hypothesis from an a priori judgment, benchmarks can help to add credibility to a study. While some argue that covariate adjustment or matching designs might work just as well in eliminating bias, field experiments can increase certainty by displacing omitted variable bias because they better allocate observed and unobserved factors.
Researchers can utilize machine learning methods to simulate, reweight, and generalize experimental data. This increases the speed and efficiency of gathering experimental results and reduces the costs of implementing the experiment. Another cutting-edge technique in field experiments is the use of the multi armed bandit design, including similar adaptive designs on experiments with variable outcomes and variable treatments over time.
There are limitations of and arguments against using field experiments in place of other research designs (e.g. lab experiments, survey experiments, observational studies, etc.). Given that field experiments necessarily take place in a specific geographic and political setting, there is a concern about extrapolating outcomes to formulate a general theory regarding the population of interest. However, researchers have begun to find strategies to effectively generalize causal effects outside of the sample by comparing the environments of the treated population and external population, accessing information from larger sample size, and accounting and modeling for treatment effects heterogeneity within the sample. Others have used covariate blocking techniques to generalize from field experiment populations to external populations.
Noncompliance issues affecting field experiments (both one-sided and two-sided noncompliance) can occur when subjects who are assigned to a certain group never receive their assigned intervention. Other problems to data collection include attrition (where subjects who are treated do not provide outcome data) which, under certain conditions, will bias the collected data. These problems can lead to imprecise data analysis; however, researchers who use field experiments can use statistical methods in calculating useful information even when these difficulties occur.
Using field experiments can also lead to concerns over interference between subjects. When a treated subject or group affects the outcomes of the nontreated group (through conditions like displacement, communication, contagion etc.), nontreated groups might not have an outcome that is the true untreated outcome. A subset of interference is the spillover effect, which occurs when the treatment of treated groups has an effect on neighboring untreated groups.
Field experiments can be expensive, time-consuming to conduct, difficult to replicate, and plagued with ethical pitfalls. Subjects or populations might undermine the implementation process if there is a perception of unfairness in treatment selection(e.g. in 'negative income tax' experiments communities may lobby for their community to get a cash transfer so the assignment is not purely random). There are limitations to collecting consent forms from all subjects. Comrades administering interventions or collecting data could contaminate the randomization scheme. The resulting data, therefore, could be more varied: larger standard deviation, less precision and accuracy, etc. This leads to the use of larger sample sizes for field testing. However, others argue that, even though replicability is difficult, if the results of the experiment are important then there a larger chance that the experiment will get replicated. As well, field experiments can adopt a "stepped-wedge" design that will eventually give the entire sample access to the intervention on different timing schedules. Researchers can also design a blinded field experiment to remove possibilities of manipulation.
The history of experiments in the lab and the field has left longstanding impacts in the physical, natural, and life sciences. Modern use field experiments has roots in the 1700s, when James Lind utilized a controlled field experiment to identify a treatment for scurvy.
Other categorical examples of sciences that use field experiments include:
- Economists have used field experiments to analyze discrimination, health care programs, charitable fundraising, education, information aggregation in markets, and microfinance programs.
- Engineers often conduct field tests of prototype products to validate earlier laboratory tests and to obtain broader feedback.
- Geology has a long history of field experiments, since the time of Avicenna
- Anthropology field experiments date back to Biruni's study of India.
- Social psychology has pioneering figures who utilized field experiments, including Kurt Lewin and Stanley Milgram.
- Agricultural science researcher R.A. Fisher analyzed randomized actual "field" experimental data for crops.
- Political Science researcher Harold Gosnell conducted an early field experiment on voter participation in 1924 and 1925.
- Meyer, B. D. (1995). "Natural and quasi-experiments in economics". Journal of Business & Economic Statistics. 13 (2): 151–161. JSTOR 1392369.
- Lee, D. S.; Moretti, E.; Butler, M. J. (2004). "Do voters affect or elect policies? Evidence from the US House". The Quarterly Journal of Economics. 119 (3): 807–859. doi:10.1162/0033553041502153. JSTOR 25098703.
- Rubin, Donald B. (2005). "Causal Inference Using Potential Outcomes". Journal of the American Statistical Association. 100 (469): 322–331. doi:10.1198/016214504000001880.
- Nyman, Pär (2017). "Door-to-door canvassing in the European elections: Evidence from a Swedish field experiment". Electoral Studies. 45: 110–118. doi:10.1016/j.electstud.2016.12.002.
- Broockman, David E.; Kalla, Joshua L.; Sekhon, Jasjeet S. (2017). "The Design of Field Experiments with Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs". Political Analysis. 25 (4): 435–464. doi:10.1017/pan.2017.27.
- Duflo, Esther (2006). Field Experiments in Development Economics (Report). Massachusetts Institute of Technology.
- Harrison, G. W.; List, J. A. (2004). "Field experiments". Journal of Economic Literature. 42 (4): 1009–1055. doi:10.1257/0022051043004577. JSTOR 3594915.
- LaLonde, R. J. (1986). "Evaluating the econometric evaluations of training programs with experimental data". The American Economic Review. 76 (4): 604–620. JSTOR 1806062.
- Gordon, Brett R.; Zettelmeyer, Florian; Bhargava, Neha; Chapsky, Dan (2017). "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook". doi:10.2139/ssrn.3033144.
- Athey, Susan; Imbens, Guido (2016). "Recursive partitioning for heterogeneous causal effects: Table 1". Proceedings of the National Academy of Sciences. 113 (27): 7353–7360. doi:10.1073/pnas.1510489113. PMC 4941430. PMID 27382149.
- Scott, Steven L. (2010). "A modern Bayesian look at the multi-armed bandit". Applied Stochastic Models in Business and Industry. 26 (6): 639–658. doi:10.1002/asmb.874.
- Raj, V.; Kalyani, S. (2017). "Taming non-stationary bandits: A Bayesian approach". arXiv:1707.09727 [stat.ML].
- Dehejia, R.; Pop-Eleches, C.; Samii, C. (2015). From local to global: External validity in a fertility natural experiment (PDF) (Report). National Bureau of Economic Research. w21459.
- Egami, Naoki; Hartman, Erin (19 July 2018). "Covariate Selection for Generalizing Experimental Results" (PDF). Princeton.edu.
- Blackwell, Matthew (2017). "Instrumental Variable Methods for Conditional Effects and Causal Interaction in Voter Mobilization Experiments". Journal of the American Statistical Association. 112 (518): 590–599. doi:10.1080/01621459.2016.1246363.
- Aronow, Peter M.; Carnegie, Allison (2013). "Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable". Political Analysis. 21 (4): 492–506. doi:10.1093/pan/mpt013.
- Aronow, P. M.; Samii, C. (2017). "Estimating average causal effects under general interference, with application to a social network experiment". The Annals of Applied Statistics. 11 (4): 1912–1947. doi:10.1214/16-AOAS1005.
- Woertman, W.; de Hoop, E.; Moerbeek, M.; Zuidema, S. U.; Gerritsen, D. L.; Teerenstra, S. (2013). "Stepped wedge designs could reduce the required sample size in cluster randomized trials". Journal of Clinical Epidemiology. 66 (7): 752–758. doi:10.1016/j.jclinepi.2013.01.009. PMID 23523551.
- Tröhler, U. (2005). "Lind and scurvy: 1747 to 1795". Journal of the Royal Society of Medicine. 98 (11): 519–522. doi:10.1177/014107680509801120.
- Ahmed, A. S. (1984). "Al-Beruni: The First Anthropologist". Rain (60): 9–10. doi:10.2307/3033407. JSTOR 3033407.
- Fisher, R.A. (1937). The Design of Experiments (PDF). Oliver and Boyd Ltd.
- Gosnell, Harold F. (1926). "An Experiment in the Stimulation of Voting". American Political Science Review. 20 (4): 869–874. doi:10.1017/S0003055400110524.