No, the replication crisis has not invalidated most landmark psychology findings as a whole, though it has substantially weakened many effects, exposed questionable practices, and prompted lasting reforms. Larg...
Why this question matters
The replication crisis has raised serious concerns about the reliability, effect sizes, and research practices behind some prominent psychology findings. However, the available evidence is mixed: many effects have weakened or failed in replication attempts, while others remain supported, have been refined, or depend on context and methodology.
The claim being judged
The claim asks whether the replication crisis has invalidated most landmark psychology findings. This is a broad statement, because “landmark findings” can refer to highly cited social psychology experiments, widely taught textbook examples, influential clinical findings, cognitive psychology results, developmental studies, or entire theoretical traditions.
The replication crisis refers to a period of heightened scrutiny beginning especially in the 2010s, when large-scale replication projects found that many published psychology studies did not produce similar results when repeated with new samples. Concerns included small sample sizes, flexible statistical choices, publication bias, selective reporting, and incentives favoring surprising findings.
The central question is not only whether individual findings replicated exactly, but whether the original claims remain useful after considering newer evidence. Some findings may have smaller effects than first reported, may apply only in narrower circumstances, or may require revised explanations. Others may remain comparatively robust across settings.
What the evidence shows
Large replication initiatives have reported mixed but concerning results. The Open Science Collaboration’s 2015 project attempted to replicate 100 psychology studies and found that a minority produced statistically significant results in the same direction under the replication conditions, with average effect sizes smaller than in the original studies. Other multi-lab projects have also found substantial variation in which findings repeat consistently.
These results have especially affected parts of social psychology where dramatic, counterintuitive, or context-sensitive effects became well known. Examples often discussed in replication debates include social priming, ego depletion, stereotype threat, power posing, and some judgment-and-decision-making effects. The status of these areas varies: some have seen major revisions, some remain disputed, and some have shifted toward more precise boundary conditions.
At the same time, it would be an overreach to treat the replication crisis as applying uniformly to all of psychology. Cognitive psychology, psychometrics, behavioral genetics, perception research, and parts of clinical and developmental psychology include many findings supported by converging methods, large datasets, or repeated observation. Even within social psychology, some effects have replicated more reliably than others.
The strongest summary is that the replication crisis has substantially changed confidence in a notable share of famous psychology findings, particularly where original evidence came from small, flexible, or isolated studies. It has not, by itself, shown that most landmark findings across the entire field should be discarded.
Where uncertainty remains
A major uncertainty is the denominator: there is no single agreed list of “landmark psychology findings.” If the list is limited to highly publicized social psychology claims from the late twentieth and early twenty-first centuries, the affected share may look large. If it includes broader areas such as memory, attention, perception, learning, personality measurement, and clinical assessment, the picture is more mixed.
Another uncertainty is how to classify a finding that changes after replication. A result that appears smaller, more conditional, or theoretically different from the original report may not be best described as simply invalidated. It may be partially supported, narrowed, or replaced by a more cautious interpretation.
Future assessments would benefit from systematic reviews that define a set of landmark findings in advance, rate the quality of original and replication evidence, and distinguish failed exact replication, smaller effect size, limited generalizability, and continued support.
The three parts of the claim
The umbrella claim is actually several claims bundled into one. Each needs its own evaluation.
Model comparison
How each panel model rated the three parts of the claim| Model | Part 1 | Part 2 | Part 3 | Overall |
|---|---|---|---|---|
| Grok 4.3 | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 70% |
| OpenAI GPT-5.4 | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 70% |
| Mistral Medium 3.5 | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 75% |
| Llama 4 Maverick | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 80% |
| Claude Opus 4.7 | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 60% |
| Gemini 3.1 Pro | — | — | — | Incomplete |
| DeepSeek V4 Pro | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 70% |
| GLM 5.1 | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 85% |
| Kimi K2.6 | — | — | — | Incomplete |
| Qwen 3.7 Max | Yes · 88% | Mixed · 54% | Yes · 78% | Mixed · 85% |
What would change our mind
The current evidence leans one way. But we're not committed to the conclusion, we're committed to the evidence.
- A systematic, preregistered review defining a representative set of landmark psychology findings and rating their replication status using consistent criteria.
- New large-scale multi-lab replication projects showing that a clear majority of landmark findings across multiple psychology subfields either do or do not reproduce under well-powered conditions.
- Meta-analyses that account for publication bias and show whether key famous effects remain practically meaningful after correction.
- Evidence that current reforms such as preregistration, registered reports, and open data substantially change replication rates for newly published landmark studies.
- Clearer disciplinary consensus on how to classify findings that become smaller, more conditional, or theoretically revised after replication attempts.
Common questions
References
Journal Article
Review
Organization
Publisher Resource
What each model said
Probably not. The strongest defensible judgment is that the replication crisis seriously weakened confidence in many published psychology findings, including some famous ones, but it did not invalidate “most la...
The replication crisis has not invalidated *most* landmark psychology findings, but it has significantly undermined confidence in a substantial portion of them. Large-scale replication projects, such as the Ope...
The replication crisis has raised concerns about the validity of some landmark psychology findings, but it has not invalidated most of them across psychology as a whole. Confidence: 80% The evidence is mixed, w...
No, the replication crisis has not invalidated "most" landmark psychology findings, though it has substantially weakened confidence in a significant subset of them and exposed serious methodological problems across the field. The accurate characterization is more nuanced: replication rates in major projects have been concerning but not catastrophic, and the impact has been uneven across subfields. Confidence: Moderately high (around 80%) on the overall judgment that "most landmark findings invalidated" overstates t...
Incomplete response: output limit reached before the answer finished.
No, the replication crisis has not invalidated most landmark psychology findings. While large-scale replication efforts have revealed that many published effects are weaker, more context-dependent, or less stat...
The claim that the replication crisis has invalidated most landmark psychology findings is overstated and misleading. Confidence: High Key evidence: - The 2015 Open Science Collaboration study in Science found...
Incomplete response: output limit reached before the answer finished.
No, the replication crisis has not invalidated most landmark psychology findings across the discipline as a whole, though it has overturned, weakened, or heavily qualified a substantial and highly visible subse...