Talk:Replication crisis

Latest comment: 9 months ago by ProgressiveProblemshift in topic Wrong statement

Open science collaboration recent science publication edit

In August 2015 the open science collaboration (based in the Center for Open Science) published a paper in Science (journal) [1] (the paper appears to be open access), in which they report the outcomes of 100 replications of different experiments from top Cognitive and Social Psychology journals. Depending on how they assessed replicabilityie I(e.g. ndependent p values or aggregate data (meta-analytic) or subjective) they report replicability of social psychology studies between 23% (JPSP P values) and 58% (PsychSci - Metaanalytic) and between 48% p value, JEP and 92% metaanalytic PsychSci for cognitive studies. The paper is (to my judgement) be very carefully constructed and very thorough. It is not easy to interpret these percentages by the way as there is hardly any data from other fields about replication success rates. The only indications come from cell biology (see the science paper) where they are talking about percentages as low as 11% to 25% (probably based on p value alone). If this is indicative for all sciences (but I would not hazard to do so) it appears that psychology is neither much worse, nor much better than most. But that would be my own original interpretation and hence not useful for Wikipedia.

I think we should construct a brief section on the outcomes of this programme / paper for this article. I will think about it - but it may take some time (busy) and should be done with due attention to nuance, anyone else is welcome to start it. Arnoutf (talk) 14:27, 30 August 2015 (UTC)Reply

See reproducibility project. Andries (talk) 21:59, 1 September 2015 (UTC)Reply

Working on Updating Replication crisis page edit

I'm working on updating this page and working edits can be found in my sandbox. Pucla (talk) 17:41, 13 November 2015 (UTC)--puclaReply

QRPs edit

We define QRPs as "while not intentionally fraudulent, involve capitalizing on the gray area of acceptable scientific practices or exploiting flexibility in data collection, analysis, and reporting" and then we list a bunch of gray areas...and "falsifying data".

I would argue that many researchers don't/didn't even realize that, e.g., selective stopping bought in significant bias, and I'm assuming it hasn't been traditionally prohibited. Falsifying data seems like a different category altogether. Unlike the others, it's not a gray area. This is noted in the discussion section of the cited article[2] (currently ref 5 in the main article) from which we draw our conclusion: " Although falsifying data (Item 10 in our study) is never justified, the same cannot be said for all of the items on our survey"

It's also worth noting that the statement that "A survey of over 2,000 psychologists indicated that nearly all respondents admitted to using at least one QPR" does not appear to be in the study cited. The highest any individual category is 66.5% ("In a paper, failing to report all of a study’s dependent measures"). It certainly seems likely, but I don't see any statement of that.

We can safely say "a majority" while leaving off the "falsifying data" category, as both the "In a paper, failing to report all of a study’s dependent measures" and "Deciding whether to collect more data after looking to see whether the results were significant" categories exceed 50%. (Table/fig 1 in the article).

As such, I'm going to be bold and remove the falsifying data category and change the claim to "a majority". The above is documentation of why and provides a starting pont for refutation if someone disagrees with my assessment.

General Wesc (talk) 16:10, 13 February 2016 (UTC)Reply

"Quotes" subsection removed edit

I've removed the "quotes" subsection from the page, as it doesn't really fit within the article as-is: see below for the text I removed. -- Markshale (talk) 15:15, 6 March 2016 (UTC)Reply

Begin removed text:

  • By Diederik Stapel From the authorized english translation by Nicholas J.L. Brown available as a free download in PDF format
    • Clearly, there was something in the recipe for the X effect that I was missing. But what? I decided to ask the experts, the people who’d found the X effect and published lots of articles about it [..] My colleagues from around the world sent me piles of instructions, questionnaires, papers, and software [..] In most of the packages there was a letter, or sometimes a yellow Post-It note stuck to the bundle of documents, with extra instructions: “Don’t do this test on a computer. We tried that and it doesn’t work. It only works if you use pencil-and-paper forms.” “This experiment only works if you use ‘friendly’ or ‘nice’. It doesn’t work with ‘cool’ or ‘pleasant’ or ‘fine’. I don’t know why.” “After they’ve read the newspaper article, give the participants something else to do for three minutes. No more, no less. Three minutes, otherwise it doesn’t work.” “This questionnaire only works if you administer it to groups of three to five people. No more than that.” I certainly hadn’t encountered these kinds of instructions and warnings in the articles and research reports that I’d been reading. This advice was informal, almost under-the-counter, but it seemed to be a necessary part of developing a successful experiment. Had all the effect X researchers deliberately omitted this sort of detail when they wrote up their work for publication? I don’t know.
      • From his memoirs: "Ontsporing" (English, "Derailment") Nov. 2012

End of removed text.

General Section Makes No Sense edit

We've got a statement about the prevalence of the problem, followed by a bulleted list of disciplines each followed by a percentage, and then another percentage in brackets, all appear to have been either rounded to the the nearest 10% or to have been drawn from a conveniently sized sample. How am I supposed to interpret this information? Am I being told that 90% of chemistry papers are not reproducible or that 90% of them are? Is 60% some sort of confidence interval? The whole this is just nonsensical. --81.151.18.242 (talk) 08:21, 1 August 2016 (UTC)Reply

Mentions I've seen are that a high percentage of studies are not reproducible when tried, and that it's hard or seldom even tried. It's also characterized that peer-review or major publication or major study doesn't matter -- it's that the complexity of the topics and the methods and such simply leads to different results. Markbassett (talk) 13:47, 4 August 2016 (UTC)Reply
p.s. There seems a the usual kinds of debate or diversity of viewpoints about the reality or definition or significance of this item, over the causes, and so forth. e.g.:
Cheers, Markbassett (talk) 14:27, 4 August 2016 (UTC)Reply
The Overall section may or may not make sense, but it doesn't shed any light on the scope of any replication crisis; the numbers that matter are the proportion of experiments that are not replicated or suffer replication failures, not the proportion of researchers that are involved. If a researcher performs 100 experiments over his career, and 1 fails to replicate, that's not really a crisis. The numbers quoted aren't informative to people outside the field, and I'd wonder how informative they are to persons within the fields. I'd be tempted to remove those numbers altogether. (The question is whether the problems of weak effects, small samples, publication bias, and p-hacking are confined to psychology and clinical medicine, or are more widespread.) Ob:XKCD Lavateraguy (talk) 21:39, 31 December 2017 (UTC)Reply

Public policy edit

Please explain how the inability to reproduce the same result in a study comparing subjects wearing body cameras to subjects not wearing body cameras doesn't relate to the research replication crisis? The article cited even explains why they may have been unable to get a result that supports when they did the same experiments (police officers wearing body cameras compared to officers in the same department not wearing cameras). Did your read the articles? Natureium (talk) 18:04, 21 October 2017 (UTC)Reply

(1) The sources don't tie this new study to the Replication Crisis. (2) There is no study that was replicated. The earlier studies may very well be right for the localities and individuals that they studied. (3) That two small-N studies come to a different conclusion on a new phenomena than a large-N study is not what the Replication Crisis is about. Snooganssnoogans (talk) 18:25, 21 October 2017 (UTC)Reply
I agree with Snoogans - Natureium, do the articles you propose to cite mention "replication crisis" or "failure to replicate" directly? Unless they do, this is WP:SYNTH. Neutralitytalk 21:11, 21 October 2017 (UTC)Reply
I understand that thinking that led this content to be added but yes given the sources, it is SYN. I wonder if there is a source out there, that actually mentions this is an example... Jytdog (talk) 00:43, 22 October 2017 (UTC)Reply
Thanks for explaining. I understand. I'll look for a source that mentions it with those specific words. Natureium (talk) 14:45, 23 October 2017 (UTC)Reply

Failure to reproduce figures in 'Outline' section misleading edit

These figures are not suggestive of a reproducibility crisis and appear to have been misunderstood so should not be highlighted in this page.

The usual significance cut offs (p value) eg 0.001 or 0.05 mean that it is completely normal not to be able to reproduce an experiment, which is what the figures refer to.

The p value is usually set at 0.05 which means that a researcher would only have needed to repeat 20 studies in their career to expect a irreproducible result.

The percentage of scientists who fail to reproduce an experiment without knowing how many experiments they have tried to reproduce has no meaning. They may just indicate tendency to re-run studies in different fields.

Since the figures are meaningless without context they give a misleading impression of the failure to reproduce eg that 87% of chemistry experiments are irreproducible.

The source article is fine, and elaborates on the above figures to highlight other issues, for instance how often these failures to reproduce are published. If I had time I would add the entirety of the findings to this page, but the data currently chosen should be removed as they are not indicative of the page topic.

reference edit

Here's an additional reference from behavioral neuroscience: Kafkafi, N; Agassi, J; Chesler, EJ; Crabbe, JC; Crusio, WE; Eilam, D; Gerlai, R; Golani, I; Gomez-Marin, A; Heller, R; Iraqi, F; Jaljuli, I; Karp, NA; Morgan, H; Nicholson, G; Pfaff, DW; Richter, SH; Stark, PB; Stiedl, O; Stodden, V; Tarantino, LM; Tucci, V; Valdar, W; Williams, RW; Würbel, H; Benjamini, Y (April 2018). "Reproducibility and replicability of rodent phenotyping in preclinical studies". Neuroscience and biobehavioral reviews. 87: 218–232. doi:10.1016/j.neubiorev.2018.01.003. PMID 29357292. --Randykitty (talk) 15:38, 23 January 2019 (UTC)Reply

PS: the talks presented at the meeting from which this report was the result are available online on the Tel Aviv University website (here). --Randykitty (talk) 16:42, 10 January 2020 (UTC)Reply

Reproducibility crisis in machine learning edit

The article misses a section about reproducibility crisis in machine learning. -- JakobVoss (talk) 17:24, 13 September 2019 (UTC)Reply

Paragraph on "Causes" edit

The paragraph, "Glenn Begley and John Ioannidis proposed these causes for the increase in the chase for significance:

   Generation of new data/publications at an unprecedented rate.
   Majority of these discoveries will not stand the test of time.
   Failure to adhere to good scientific practice and the desperation to publish or perish.
   Multiple varied stakeholders.

They conclude that no party is solely responsible, and no single solution will suffice."

has these issues:

It lacks a precise reference. I have found it: Reproducibility in Science. Improving the Standard for Basic and Preclinical Research C. Glenn Begley, John P.A. Ioannidis, Circulation Research. But I have only found the third point in the abstract.

The second point is no cause, but an effect. It remains unclear why the fourth point should be a cause.

Therefore, I suggest to drop this paragraph and I will do so unless somebody asks for keeping it. Werner A. Stahel en (talk) 13:04, 14 October 2020 (UTC)Reply

Shift to a Complex Systems Paradigm edit

This section is difficult to understand and may need further clarification. It is not clear experimental studies on more complex, nonlinear systems should be less reproducible than studies on simple, linear systems. Appropriately large samples sizes and correct statistical analysis should yield reproducible findings when these actually exist. Biologic systems are typically complex and nonlinear yet these are successfully studied. The two references cited (107 and 108) discuss the complexities and instabilities in psychological research but do not convincingly make that case that such research should not be reproducible. — Preceding unsigned comment added by TailHook (talkcontribs) 06:56, 16 December 2020 (UTC)Reply

Sources for most impacted fields edit

The article currently states that the "replication crisis most severely affects the social sciences and medicine.". To make this claim, it is not sufficient to show that replicability is low in the mentioned fields. Instead, a source would have to state that this problem is more prevalent in social sciences and medicine than in others. While two sources are provided, they both do not support this claim in my opinion. I would therefore argue that this claim should be removed unless more substantial sources can be given.

Jochen Harung (talk) 18:24, 25 April 2021 (UTC)Reply

The citations mentions:
In disciplines such as medicine, psychology, genetics and biology, researchers have been confronted with results that are not as robust as they originally seemed.[1]
Some have blamed the reliance on p-values for the replication crises now afflicting many scientific fields. In psychology, in medicine, and in some fields of economics, large and systematic searches are discovering that many findings in the literature are spurious.[2]

References

  1. ^ Schooler, J. W. (2014). "Metascience could rescue the 'replication crisis'". Nature. 515 (7525): 9. Bibcode:2014Natur.515....9S. doi:10.1038/515009a. PMID 25373639.
  2. ^ Smith, Noah. "Why 'Statistical Significance' Is Often Insignificant". Bloomberg. Retrieved 7 November 2017.
Medicine is supported and we could mention psychology, but social sciences, the broader field that include psychology, is not specifically supported. However many of the other social sciences are mentioned in the body of the article and although WP:LEADCITE mentions the need for information in the lead to be cited, we should avoid duplication of citations in both the lead and the body. It seems a fair generalization to say "social sciences" rather then listing out the sciences mentioned in the article. Richard-of-Earth (talk) 00:55, 29 April 2021 (UTC)Reply
Thank you for your comment. Maybe I am missing something, but it is still unclear to me where any of the sources explictly state that the social sciences are more severely affected than other disciplines. They only state these fields are affected without an explicit comparison in severity to other fields. Furthermore, your first citation mentions genetics and biology which do not belong to the fields of social science nor medicine. Therefore, even if we ignore the point that no explicit comparison of different fields is apparent in the cited sources, biology would also have to be included in the above statement. Jochen Harung (talk) 17:52, 29 April 2021 (UTC)Reply
I felt it was implied since other subjects are not mentioned, but perhaps that is a sort of WP:OR. Sure, just remove it. The readers can decide for themselves. Richard-of-Earth (talk) 19:36, 1 May 2021 (UTC)Reply

Regarding the hat note edit

...that I recently added. Please compare the hat note to that of Reproducibility. Thanks CapnZapp (talk) 11:27, 26 October 2021 (UTC)Reply

GA Review edit

This review is transcluded from Talk:Replication crisis/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: Tommyren (talk · contribs) 00:19, 31 December 2021 (UTC)Reply


I am excited to review this article!

Before evaluating the article based the good article standards, I'll just be listing a few thoughts that struck me as I read though the article. Some, or perhaps all of them, may go beyond the good article standards.

Lead edit

1. "The replication crisis most severely affects the social and medical sciences." This statement is not supported by the inline citation that follows it, as the source only says that replication is a problem in social and medical sciences, but does not necessarily say that it is most severe in social and medical sciences. A similar issue has been discussed in the section "Sources for most impacted fields." However, from citation #10, the Fanelli article, it seems that we just might argue that the medical sciences are most severely affected. From citation #79, the Duvendack et al. article, it seems that we are argue that the economic sciences are more severely affected than others.

 Y I've modified the statement. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply
I agree with your modification. However, as mentioned above, there is existing literature on what fields seem to have more of a replication crisis, and I feel that this information should appear somewhere in the article. Tommyren (talk) 16:43, 31 January 2022 (UTC)Reply
I haven't really seen reviews or large scale research comparing fields (other than that Nature survey, which I wouldn't describe as a RS for this particular claim), so I'm not sure. By "above", do you mean your statement here? The Fanelli study is specifically about fabrication/falsification, and I would say it's not an RS for a statement about the replication crisis generally. I would summarise Duvendack as being more focussed on the slow pace of change within economics - it talks mainly about how little is being done to do more replications and make more effort to have reproducible results, rather than describing an actual lack of reproducible results. You'll see I changed the phrasing of the statement it's cited for because of that. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply
I need to go through the Begley and Ioannidis paper to make sure that nothing along these lines was mentioned there.--Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply

2. Why does Ioannidis's (2005) paper deserve to be screenshotted but not other people's papers? I'm not saying that we should delete the image. In fact, to me the paper's title is quite effective at piquing my interest for the whole wikipedia article. But I just want to make this comment here in case others come up with a better image. — Preceding unsigned comment added by Tommyren (talkcontribs) 14:34, 31 January 2022 (UTC)Reply

I had a look through wikimedia, I wasn't able to find anything better unfortunately. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply

Background edit

I've done about a third of the background section now. The plan is to have an explanation of replication/reproducibility and its importance (done), an explanation of what the replicability crisis is and how it fits into the scientific process (not done), and potentially some explanation of significance and effect size testing (not done, still figuring out how necessary it would be). --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply

The section of the talk page referring to a “General” section had sources that may be useful here, as would the three reviews I mentioned under "Xurizuri's concerns". I would also definitely need to add information on the statistics. A lot of the causes and remedies sections are nonsense without it. --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply
"This should result in 5% of hypotheses that are supported being false positives (an incorrect hypothesis being erroneously found correct), assuming the studies meet all of the statistical assumptions." I don't think this is a correct interpretation of the p-value. If the true effect sizes are overwhelmingly large, the percentage will be higher than 5%. A 0.05 p value means that if there were no effect, the statistical procedure would still find a statistically significant result 5% of the times. Tommyren (talk) 14:34, 3 February 2022 (UTC)Reply

1. Last paragraph needs citations. — Preceding unsigned comment added by Tommyren (talkcontribs) 03:04, 18 February 2022 (UTC)Reply

2. The difference between systematic and conceptual replication could be made a bit clearer. — Preceding unsigned comment added by Tommyren (talkcontribs) 00:32, 6 March 2022 (UTC)Reply

In Psychology edit

1. This section contains a lot of information on causes of the replication crisis, including QRPs and the disciplinary social dilemma. Information of a similar vein is given later in the "Causes" section. I wonder if it would be a good idea to take information about causes of the crises from this section and move it to the "Causes" section. I think this may make the article feel less repetitive. Similarly, this section also contains a lot of information on potential remedies of the replication crisis, such as the discussion on whether inviting the original author in replication efforts and result-blind peer review. This should go into the "Remedies" section. Also, perhaps the "methodological terrorism" controversy can go under the "Consequences" section.

Tommyren, Marisauna, I think that with the current L2 heading of "Scope", it would make sense to remain where it is, because the sources are directly addressing psychology. However, from checking the actual content of the "Scope" section under the other academic fields, I think that it should actually be renamed to "Prevalence" or something similar - "Scope" could theoretically have that meaning, but the lack of clarity is still a problem. Regardless, if the content is moved, it needs to be attributed to being specific to psychology, per the sources. (Also I hope it's correct for me to add comments to points like this.) --Xurizuri (talk) 01:43, 27 January 2022 (UTC)Reply
I went ahead and did this. --Xurizuri (talk) 03:19, 31 January 2022 (UTC)Reply
  Done I didn't initially move the methodological terrorism stuff, but I have now. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

2. "Several factors have combined to put psychology at the center of the controversy." This sentence is misleading to me because it seems to say that the replication crisis is most severe in the field of psychology. However, if you go into the source for this sentence, it actually says largely the opposite--other fields could have replication crises just as severe. So far, it seems to me that psychology is at the center of the controversy largely because it has received the most scholarly and media attention, not necessarily because the crisis is particularly bad in this field.

  Done I added a statement clarifying this --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

3. "Social priming." Maybe we should explain what social priming is. Also, I am not 100% sure that The Chronicle of Higher Education is a reliable source.

4. I don't think we should mention the "Psycho-babble" report by the Independent. "Psycho-babble" is such a colloquial word and can mean different things to different people. Is the Independent invalidating all aspects of non-replicable research? Would that be fair?

I agree. It's not a fair conclusion to draw, and while WP:RSP#The Independent has it as "generally reliable", that rating only applies to its non-specialist articles. This is absolutely a specialist topic. I've removed that part of the sentence.  Y --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

5. "Early analysis of result-blind peer review, which is less affected by publication bias, has estimated that 61 percent of result-blind studies have led to null results, in contrast to an estimated 5 to 20 percent in earlier research." This sentence appears twice in the article. The same information is also presented under the Remedies section. I feel that we can just keep the latter.

6. "First open empirical study." I saw nothing in the source suggesting that this is the first of such studies.

7. "Replications appear particularly difficult when research trials are pre-registered and conducted by research groups not highly invested in the theory under questioning." I do not see how this sentence fits in the paragraph. The first sentence of the paragraph seems to indicate that replication is an issue in psychology partly because some of the theories tested may not be tenable, whereas pre-registration and researcher investment seem more related to the issues of QRPs.

Yeah I don't know what to think about this paragraph. It seems like a weird attack on a specific theory, but I'll try to hunt down whatever that paragraph may have come from.
Note: I said this at some point on 1 Feb. --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply
I've now removed most of that paragraph because I cannot figure out where it could've come from and as I said, it seems like a weird attack. --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply

8. "p-hacking." It may be helpful to give a brief definition of what p-hacking is.

 Y This isn't my nomination, but I've made an attempt to address this. -- Xurizuri (talk) 09:18, 9 January 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply
Given that I rearranged a bunch of content, I moved the definition to what is now the first time the term is used. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

9. What exactly is BWAS? Tommyren (talk) 14:27, 15 May 2022 (UTC)Reply

In Medicine edit

1. The part on commonalities of unreplicable papers could go under the "Causes" section.

  Done --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

2. "A survey on cancer researchers found that half of them had been unable to reproduce a published result." For reasons described above, this is to be expected and does not necessarily show that a replication crisis exists.

I've responded to the above point. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply

3. "Flaws." What flaws is the word referring to?

 Y I've attempted to address it, but I don't have any way to get access to the full article so I may be missing details. --Xurizuri (talk) 10:27, 1 February 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply

4. What is the purpose of the block quote? — Preceding unsigned comment added by Tommyren (talkcontribs) 19:51, 31 January 2022 (UTC)Reply

 Y I truly can't remember why I did that, it does seem bizarre. I've integrated it back into the text. --Xurizuri (talk) 10:27, 1 February 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply

5. Does cancer research merit its own tiny little section?

 Y I've removed the subheading. The main "In medicine" section also discussed cancer research already anyway. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

In Marketing edit

1. The part of globalization to me seems to be a "Cause" of the crises and does not belong in the "Scope" Section.

I'm not sold on it being a cause. It seems more to me like a reason why it's really important to attempt to replicate findings in marketing, rather than a reason why results don't replicate. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

In Economics edit

1. The part about the fragility of econometrics to me seems to be a "Cause" of the crises and does not belong in the "Scope" Section.

  Done --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

Across Fields edit

1. As explained by others in this talk page (see comments by Lavateraguy, also see section on "Failure to reproduce figures in 'Outline' section misleading"), the fact that many scholars have encountered unreplicable studies is in fact expected and not necessarily problematic. I personally do not see it as necessary to include the first sentence of this section.

I may try a few different configurations of the "prevalence" section. I think talking about encountering unreplicable studies is important - all of the replicability crisis, other than the QRPs and fraud, can and has been characterised as expected. Also, if reliable sources think it's important to talk about it, then so do we. And sources do talk about that survey. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply

2. "Only a minority." Do we have a concrete percentage for this?

The article doesn't give one. --Xurizuri (talk) 13:36, 31 January 2022 (UTC)Reply

3. "The authors also put forward possible explanations for this state of affairs." Would it be possible to elaborate on what these explanations are, especially in terms of why unreplicable studies are cited more?

  Done --Xurizuri (talk) 13:36, 31 January 2022 (UTC)Reply
Thanks for the edit! However, I'm not quite sure I understand this sentence: "the trend is not affected by publication of failed reproductions, and only 12% of citations following this will mention the failed replication." What does "this" mean? Tommyren (talk) 14:39, 31 January 2022 (UTC)Reply
 Y Good point, hopefully fixed now. --Xurizuri (talk) 10:27, 1 February 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply

Causes edit

1. "Generation of new data/publications at an unprecedented rate." According to the original source, it does not seem to be a "trigger" of a crisis but seems more like something that makes things worse.

2. "A success and a failure." I understand it to mean "a successful and a failed attempt at finding evidence in support of the alternative hypothesis." Is this correct? Maybe we can clarify this.

 Y I've made an attempt at clarifying - I settled on making a "Notes" section because I prefer to avoid too much jargon in the main text, especially if there's any possible way to explain the implications of something without having to go into the details of it. --Xurizuri (talk) 10:27, 1 February 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply

Historical and sociological roots edit

1. What is scientometrics?

  Done --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply

2. I would suggest changing the title of this section to just "Historical Roots," and we should move arguments by Mirowski, "a group of STS scholars," and Smith to before this section together with other recently publishes sources because these works were published rather recently.

Someone made it part of the "Causes" section, which I agree with. I changed the heading it to "Historical and sociological roots" instead, because the section is discussing the sociology of how the current situation arose. Under this name, none of the content should need to move. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply
It was you that did it! Cheers. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply

3. "Attention." What exactly is this word referring to?

I won't lie, I am avoiding this section a little, because sociology is a bit out of my wheelhouse. I'll just have to sit down and power through it at some point. --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply

4. Should the five numbered points go under "Historical and sociological roots?"

I'm genuinely not sure if it fits under history/sociology. Regardless, I'd actually like to summarise the points, I doubt that it's due weight to be talking that much about their argument. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply
I have summarised the points and created a "publish or perish" subsection. --Xurizuri (talk) 13:40, 2 February 2022 (UTC)Reply

Publish or perish culture in academia edit

1. It would be nice if there can be proper citation for Ravetz's book.

2. I have a general feeling that this section can be more clearly structured and written. While the opening paragraphs connects nicely to more theoretical discussions in the last section, they do prevent readers from getting straight away what the publish or perish culture really is.

Questionable research practices and fraud edit

1. "They consist of applying different methods of data screening, outlier rejection, subgroup selection, data transformations, models, concomittant variables, and alternative estimation and testing methods, and finally reporting the variety that produces the most significant result." This sentence is becoming really confusing. It also seems a little repetitive considering that the following sentence comes in the next paragraph: "Examples of QRPs include selective reporting or partial publication of data (reporting only some of the study conditions or collected dependent measures in a publication), optional stopping (choosing when to stop data collection, often based on statistical significance of tests), post-hoc storytelling (framing exploratory analyses as confirmatory analyses), and manipulation of outliers (either removing outliers or leaving outliers in a dataset to cause a statistical test to be significant)"

Absolutely agree. I'll need to go through and figure out firstly which terms are actually in the sources and secondly which are being used for the same concepts. I may need to use more footnotes, actually. --Xurizuri (talk) 10:27, 1 February 2022 (UTC)Reply
 Y I've attempted to address this. --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply

2. "However, most scholars acknowledge that fraud is, perhaps, the lesser contribution to replication crises." There is a "who" tag that needs addressing.

 Y I reworded it to suit an actual source. --Xurizuri (talk) 10:27, 1 February 2022 (UTC) // edited to add a tick 03:28, 2 February 2022 (UTC)Reply

3. "Serious." I know the original author also used "serious," but a reader might wonder in what sense is fraud "serious." Is it the most morally reprehensible? Does it lead to the most wrong study results?

Good point. The author isn't actually explicit about what they mean. On reflection, that part of the sentence isn't really necessary, so I changed it.  Y --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply

4. "Positive and negative controls" Would it be clearer to just say control here? How important is it for readers to know the difference between positive and negative controls? If it is important, should we explain what the two terms mean?

5. What is confirmation bias?

  Done --Xurizuri (talk) 13:40, 2 February 2022 (UTC)Reply
It would be nice if the term can be explained in-text.Tommyren (talk) 18:35, 5 June 2022 (UTC)Reply

6."Some examples of QRPs..." This sentence may suffer from overciting. Also, the jargons would preferrably have in-text explanations.

7. "A 2012 survey of over 2,000 psychologists..." Given the critique on survey methods, I am leaning towards not including this source in this article at all.Tommyren (talk) 03:19, 7 June 2022 (UTC)Reply

Statistical issues edit

1. "According to a 2018 survey of 200 meta-analyses, 'psychological research is, on average, afflicted with low statistical power'."Are we using British or American English in this article? Sometimes I see periods/commas within quotation marks, and sometimes I see them outside quotation marks.

I've seen a few uses of center and behavior outside of proper nouns, so I believe we're using US English. Regardless, MOS says to use "logical quotation" which essentially ends up with a mix of punctuation inside and outside the quotes. I'll change any that aren't currently logical, but I haven't noticed any so far. --Xurizuri (talk) 03:19, 31 January 2022 (UTC)Reply
Partially as a result of my own writing (I use Australian English, which is mostly the same as British English), the article needs to be checked for spelling variants. Mostly around using s instead of z. --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply

Response in academia edit

1. I suspect the methodological terrorism incident is not very representative of the entire scholarly community. Can we add more information to this section?

The best sources for this would probably be the same ones as I listed for the Background section. --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply

Pharmaceutical Industry edit

1. "Amgen Oncology's cancer researchers were only able to replicate 11 percent of 53 innovative studies they selected to pursue over a 10-year period; a 2011 analysis by researchers with pharmaceutical company Bayer found that the company's in-house findings agreed with the original results only a quarter of the time, at the most. The analysis also revealed that, when Bayer scientists were able to reproduce a result in a direct replication experiment, it tended to translate well into clinical applications; meaning that reproducibility is a useful marker of clinical potential." Maybe this information should go under the "In Medicine" section?

Will do. I also want to shorten the amount of pharm industry content, I think that much info on it is undue weight. --Xurizuri (talk) 03:28, 2 February 2022 (UTC)Reply
  Done as said. --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply

Metascience edit

1. I wonder if the second paragraph is needed. There seems to be no sources for it, and most information seems to occur elsewhere. What are the CONSORT and EQUATOR guidelines anyways.

Pre-registration of studies edit

1. From what is currently in the article it's a little hard to tell the difference between result-blind peer review and pre-registration

Addressing misinterpretation of p-values edit

1. What do the Bayesion methods refer to? Bayes is mentioned three times in this section. Are they referring to the same methods?

2. What logical problems is "The Problem with p-values" referring to?

Open Science edit

1. "Unless software used in research is open source, reproducing results with different software and hardware configurations is impossible." This doesn't sound right.

2. Do we need the CERN example?

Notes edit

1. "The null hypothesis (the hypothesis that the results are not reflecting a true pattern) is rejected when the probability of the null hypothesis being true is less than 5%" This is not true. The null hypothesis is either 100% true or 100% false.

Tommyren (talk) 00:19, 31 December 2021 (UTC)Reply

Status query edit

Tommyren, Marisauna, where does this nomination stand? As far as I can tell, Maurisauna has never edited the article at all, much less to address any of the issues raised in this nomination. Xurizuri has made a few edits to address one or more issues above, and Tommyren has made various edits over the past four weeks since the article was nominated. If there isn't anyone who is able to address the issues raised last month in a timely manner, then the nomination should be closed. Thank you. BlueMoonset (talk) 01:35, 29 January 2022 (UTC)Reply

If Marisauna has moved on, I'm happy for the nomination responsibility to be transferred to me - assuming that's feasible with the technical components of how GANs are monitored. --Xurizuri (talk) 02:11, 29 January 2022 (UTC)Reply
Go ahead, I've moved on. Marisauna (talk) 05:04, 29 January 2022 (UTC)Reply
Thank you for checking in. As things stand, it seems very likely that we will have to fail this article in this round of nominations. Two issues are particularly outstanding so far. 1) There's a lot of content saying that a large percentage of researchers have encountered unreplicable research. As discusssed above and elsewhere on the Talk Page, this information is actually not necessarily indicative of a problem. I'm not saying that this information should not appear on this article, but it should be accompanied with adequate explanation so that readers will have a clear idea of what the replication crisis really is. 2) There are a few points of information without proper citation. Xurizuri has put up some citation needed tags, and I believe I also referred to some other points of uncited information above.
I apologize for taking so long to review this article. However, WP:GAN/I#R1 does require me to not only read through the whole article but also understand all sources. Maybe it's just because I'm a new reviewer, but it seems that following WP:GAN/I#R1 to the letter would take quite a bit of extra time. Tommyren (talk) 18:35, 29 January 2022 (UTC)Reply
For your point about the large percentage of researchers encountering unreplicable research, I'm currently planning to add some explanation of the scientific process and the role of replication to the "Background" section I created. I also don't believe it will be hugely difficult to find the citations on the statements marked/mentioned. --Xurizuri (talk) 10:31, 31 January 2022 (UTC)Reply

Xurizuri's concerns edit

I'm noticing that a fair amount of the article is based on blogs and opinion pieces. Those aren't inherently not-RS, because they're written generally by experts and the statements are being attributed (now). But I am concerned about the amount of the article that is based on them, especially when the article is also under-using reviews and meta-analyses published in reliable journals (Begley & Ioannidis, Shrout & Rodgers, and Stanley, Carter & Doucouliagos spring to mind). I also have assorted other concerns, some of which I am planning to address, including:

  • the lack of non-technical explanation of significance
  • a robust explanation of why people think the replication crisis is a crisis
  • the lead doesn't have a proper summary of the causes, consequences or remedies sections
  • some of the history and sociology content is nigh-incomprehensible to an untrained audience (i.e. me).
  • the de Solla-Price and Ravetz paragraphs aren't actually citing their sources (which I believe I have found, [3] and [4])

And then also the ones you've mentioned. I could address all of these given a few days, but this is starting to give me the vibes of doing a uni assignment at the last minute. And I'm not currently a uni student for a reason. Some of this may be above the needs of a good article, I'm really not sure, but that's why I'm not reviewing :) I'm going to add comments under yours above explaining what my plans are for it. That way, you can make a proper decision about how much work there is left, and this can be a complete record of recommendations. --Xurizuri (talk) 14:15, 2 February 2022 (UTC)Reply

I forgot to mention another one: the "Emphasize triangulation, not just replication" section really should be prose, not a quote. --Xurizuri (talk) 14:31, 2 February 2022 (UTC)Reply
First, THANK YOU SO MUCH FOR YOUR HARD WORK ON THIS ARTICLE! I think the article has made a lot of improvements in the last few days.
I do think for the article to reach good article status, one needs to explain what statistical significance is, make the lead section a proper summary of the article, render the article readable to an average undergraduate student, and sort out why the crisis is a crisis, and make sure all information is properly sourced. Personally I do not expect anyone to do all this in just a few days.
What's more, I'm not even sure if my current review is even half way done, as I have yet to start a deep review of all of the sources used in this article. I expect more issues to pop up.
Therefore, you by no means have to feel that you are obliged to single-handedly turn this article into a GA. I, however, want to at least follow through with WP:GAN/I#R1, reading through the articles and checking all sources. This could take some more time... Tommyren (talk) 16:59, 2 February 2022 (UTC)Reply
Hello! I was thinking to address some of the concerns you have listed. I would like to start by writing a couple of paragraphs on the causes, remedies and consequences of the crisis. I will do that by referencing the corresponding subsections below in order to ensure adequate coherence. But I have a couple of questions before starting.
First off, it is not perfectly clear if the "causes" section refers to the causes of the replication crisis, or to the causes of low rates of replicability in scientific research. The first is a historical event, the second is what I would define a "metascientific" fact. To show this, notice how the first subsection talks about sociological factors that might generally be responsible for the occurrence of a crisis in science, while other sections talk about factors that are responsible for low replication rates (e.g. low power, low base rate of true hypotheses). Therefore, I'm not a hundred percent sure on how to proceed. I was thinking to mention primarily the cause on why there seem to be such low replication rates in scientific research.
Secondly, I believe that some of the explanations given on why some factors are relevant causes of the replication crisis are not clear. I'm referring in particular to the content of the section "Historical and sociological roots". From what I can see, the works reported in this section provide explanations on why the quality check system of scientific research might have deteriorated over the last decades. Although I can see how this can be connected with the idea of scientific practice undergoing a period of "crisis", there is no explicit mention to how this is connected to the replication crisis. This results in putting the burden on the reader on trying and imaging why the things discussed in this section are relevant to the replication crisis. One possible solution to this lack of explicit connection might be to re-frame the subsection as concerning "scientific quality" in general. The connection to why this is relevant for the replication crisis might be explained in the first paragraph, and then the various works could be discussed at length.
Further, I was wondering if the "publish-or-perish culture" section should be listed as a sociological factor. I understand that in light of its relevance it might deserve a section on its own. Yet, I think it is reasonable to consider this particular cause as sociological. The publish-or-perish culture in academia is concerned with systemic norms on what are the incentives to do scientific work. Does that not count as a sociological aspect of scientific practice? Moreover, I noticed how the section talks about publication practices in general, not just the issue of the publish-or-perish academic culture. Would it make sense to rename it?
Lastly, I was hoping to get some feedback on whether the way I'm planning to structure this little paragraph in the lead would be okay. I was thinking to list all possible causes of the replication crisis by mentioning the general explanatory dimensions (e.g. methodology, statistics, sociology, theory) and for each include one or two examples in brackets.
A provisional draft of how the causes section might look like would be:
"The possible causes of the low rates of replicability in scientific research are multifaceted. Generally, the absence of replicable results in scientific fields can be attributed to methodological issues in scientific research (e.g. questionable research practices), deficient statistical standards (e.g. undepowered studies, misuse of null-hypothesis significance testing), sociological factors (e.g. publication bias, publish-or-perish culture), and theoretical shortcomings (e.g. base rate of hypothesis accuracy)."
This is just an example, it is not necessarily complete and coherent with the content of the entry. I was still wondering if the general structure is okay.
To whoever might answer this long message, thanks in advance for the help. ProgressiveProblemshift (talk) 11:44, 3 July 2023 (UTC)Reply

Suggestion to close edit

Tommyren, since you established nearly two months ago that the nomination will not pass without significant additional work, it's time to fail the nomination. You are certainly welcome to continue working on the article and posting here as to issues that will need to be addressed prior to any subsequent nomination, but the end result is clear. Thank you for your detailed work. BlueMoonset (talk) 17:35, 27 March 2022 (UTC)Reply

Small typo? edit

In the section: Historical and philosophical roots; sentence in the last paragraph: "This theory holds that each "system" such as economy, science, religion or media on communicates using its own code: true/false for science, profit/loss for the economy, news/no-news for the media, and so on.". (new->news?)--S.POROY (talk) 12:53, 30 January 2022 (UTC)Reply

Misuse of Prevalence section edit

I've noticed that some people are adding content to the Prevalence section that is not about quantitative measures of replicability and QRPs, likely because other sections are not subdivided by field. –LaundryPizza03 (d) 02:17, 24 March 2022 (UTC)Reply

Theory-Crisis edit

In the field of metascience, a growing number of recent publications are concerned with how theoretical as opposed to methodological or statistical shortcomings might be the cause of low replication rates, at least in psychological science. Some of these even talk about a "Theory-crisis" in psychology. I was wondering if it would make sense to create a separate subsection under "Causes" to report on the considerations that have been made in this area of study concerning the replication crisis. Examples of these publications are:

Fiedler, K. (2017). What constitutes strong psychological science? The (neglected) role of diagnosticity and a priori theorizing. Perspectives on Psychological Science, 12(1), 46-61. https://doi.org/10.1177/1745691616654458

Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26(5), 1596-1618. https://doi.org/10.3758/s13423-019-01645-2

Oude Maatman, F. (preprint). Psychology's Theory Crisis, and Why Formal Modelling Cannot Solve It. PsyArxiv. https://doi.org/10.31234/osf.io/puqvs

Szollosi, A., & Donkin, C. (2021). Arrested theory development: The misguided distinction between exploratory and confirmatory research. Perspectives on Psychological Science, 16(4), 717-724. https://doi.org/10.1177/1745691620966796

ProgressiveProblemshift (talk) 16:43, 15 May 2023 (UTC)Reply

Missing reference + unclear sentence edit

In the section "Background", the explanation of how NHST works ends by saying "Although p-value testing is the most commonly used method, it is not the only method.". This sentence is missing a reference, but on top of that I would argue it is not very clear. It raises the question: "The only method for what?". Given the content of that paragraph one could say it most likely means "not the only method to test significance", but since the page is on replication, I'd say that it would make more sense if it referred to methods to establish whether findings were successfully replicated in general. In such a case, I have a good reference in Nosek et al. (2022) where a small section at the beginning is dedicated to defining when we can say that original findings were replicated (i.e. "How do we decide whether the same occured again?", p. 722), where the authors describe different methodologies and criteria by which replications are defined as successful. I'd love to do it myself, but I would need a confirmation that this edit makes sense!


Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Struhl, M. K., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-114157 ProgressiveProblemshift (talk) 13:04, 22 May 2023 (UTC)Reply

Wrong statement edit

In the subsection "Result-blind peer review", it is reported that "more than 140 psychology journals have adopted result-blind peer review". I believe this statement is wrong. If one check the website article that's cited, the author says that 140 journals at the time were using registered reports (which implies result-blind peer-review). She says "journals" without mentioning specific disciplines. If one checks the source she cites, it's the COS web page on registered reports, so I assume that to come up with the 140 number she probably checked the COS's page on TOP scores for journals. By consulting the page, presently only 46 journals adopt some form of registered reports. The statemente should be changes or just deleted since, I believe, it's misleading and incorrect. Here one can see the stats of psychology journals when it comes to adopting registered reports: https://topfactor.org/journals?factor=Registered+Reports+%26+Publication+Bias&disciplines=Psychology&page=3 ProgressiveProblemshift (talk) 14:14, 14 July 2023 (UTC)Reply