Home› Browse› Science & research› Has the replication crisis inv...

Contested claim · Science & research · §0219

Has the replication crisis invalidated most landmark psychology findings?

The replication crisis has raised serious concerns about the reliability, effect sizes, and research practices behind some prominent psychology findings. However, the available evidence is mixed: many effects have weakened or failed in replication attempts, while others remain supported, have been refined, or depend on context and methodology.

Reviewed by 10 models · 3 countries 7 curated references 23 revisions Updated 19 hours ago 5 min read

Panel verdict

8/10 agreement 74% confidence 25% spread 29 May 2026 filed

8 reviewing models concluded the claim is mixed by the available evidence.

The Adjudged panel has not yet completed its review of this claim. This draft summarizes major lines of evidence and identifies the key issues the panel would need to evaluate before reaching a final judgment.

Panel synthesis

Consensus & disagreement

Where the panel agreed

8 of 10 modelsThe claim asks whether the replication crisis has invalidated most landmark psychology findings. This is a broad statement, because “landmark findings” can refer to highly cited so...

8 of 10 modelsLarge replication initiatives have reported mixed but concerning results. The Open Science Collaboration’s 2015 project attempted to replicate 100 psychology studies and found that...

8 of 10 modelsA major uncertainty is the denominator: there is no single agreed list of “landmark psychology findings.” If the list is limited to highly publicized social psychology claims from...

Where the panel diverged

1 model notedClaude Opus 4.7 gave the lowest confidence, while still reaching the same overall direction.

Why this question matters

The claim being judged

The claim asks whether the replication crisis has invalidated most landmark psychology findings. This is a broad statement, because “landmark findings” can refer to highly cited social psychology experiments, widely taught textbook examples, influential clinical findings, cognitive psychology results, developmental studies, or entire theoretical traditions.

The replication crisis refers to a period of heightened scrutiny beginning especially in the 2010s, when large-scale replication projects found that many published psychology studies did not produce similar results when repeated with new samples. Concerns included small sample sizes, flexible statistical choices, publication bias, selective reporting, and incentives favoring surprising findings.

The central question is not only whether individual findings replicated exactly, but whether the original claims remain useful after considering newer evidence. Some findings may have smaller effects than first reported, may apply only in narrower circumstances, or may require revised explanations. Others may remain comparatively robust across settings.

What the evidence shows

Large replication initiatives have reported mixed but concerning results. The Open Science Collaboration’s 2015 project attempted to replicate 100 psychology studies and found that a minority produced statistically significant results in the same direction under the replication conditions, with average effect sizes smaller than in the original studies. Other multi-lab projects have also found substantial variation in which findings repeat consistently.

These results have especially affected parts of social psychology where dramatic, counterintuitive, or context-sensitive effects became well known. Examples often discussed in replication debates include social priming, ego depletion, stereotype threat, power posing, and some judgment-and-decision-making effects. The status of these areas varies: some have seen major revisions, some remain disputed, and some have shifted toward more precise boundary conditions.

At the same time, it would be an overreach to treat the replication crisis as applying uniformly to all of psychology. Cognitive psychology, psychometrics, behavioral genetics, perception research, and parts of clinical and developmental psychology include many findings supported by converging methods, large datasets, or repeated observation. Even within social psychology, some effects have replicated more reliably than others.

The strongest summary is that the replication crisis has substantially changed confidence in a notable share of famous psychology findings, particularly where original evidence came from small, flexible, or isolated studies. It has not, by itself, shown that most landmark findings across the entire field should be discarded.

Where uncertainty remains

A major uncertainty is the denominator: there is no single agreed list of “landmark psychology findings.” If the list is limited to highly publicized social psychology claims from the late twentieth and early twenty-first centuries, the affected share may look large. If it includes broader areas such as memory, attention, perception, learning, personality measurement, and clinical assessment, the picture is more mixed.

Another uncertainty is how to classify a finding that changes after replication. A result that appears smaller, more conditional, or theoretically different from the original report may not be best described as simply invalidated. It may be partially supported, narrowed, or replaced by a more cautious interpretation.

Future assessments would benefit from systematic reviews that define a set of landmark findings in advance, rate the quality of original and replication evidence, and distinguish failed exact replication, smaller effect size, limited generalizability, and continued support.

The three parts of the claim

The umbrella claim is actually several claims bundled into one. Each needs its own evaluation.

PART 1 / 3

Large-scale replication projects found that many published psychology studies produced weaker or non-significant results when repeated.

Yes88%

PART 2 / 3

The replication crisis has invalidated most landmark findings across psychology as a whole.

Mixed54%

PART 3 / 3

Some prominent psychology findings remain supported or have been refined rather than abandoned after replication scrutiny.

Yes78%

Model comparison

How each panel model rated the three parts of the claim

Model	Part 1	Part 2	Part 3	Overall
Grok 4.3	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 70%
OpenAI GPT-5.4	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 70%
Mistral Medium 3.5	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 75%
Llama 4 Maverick	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 80%
Claude Opus 4.7	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 60%
Gemini 3.1 Pro	—	—	—	Incomplete
DeepSeek V4 Pro	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 70%
GLM 5.1	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 85%
Kimi K2.6	—	—	—	Incomplete
Qwen 3.7 Max	Yes · 88%	Mixed · 54%	Yes · 78%	Mixed · 85%

An honest commitment

What would change our mind

The current evidence leans one way. But we're not committed to the conclusion, we're committed to the evidence.

A systematic, preregistered review defining a representative set of landmark psychology findings and rating their replication status using consistent criteria.
New large-scale multi-lab replication projects showing that a clear majority of landmark findings across multiple psychology subfields either do or do not reproduce under well-powered conditions.
Meta-analyses that account for publication bias and show whether key famous effects remain practically meaningful after correction.
Evidence that current reforms such as preregistration, registered reports, and open data substantially change replication rates for newly published landmark studies.
Clearer disciplinary consensus on how to classify findings that become smaller, more conditional, or theoretically revised after replication attempts.

Common questions

Does a failed replication mean the original study was fraudulent?

No. A replication attempt can differ because of sampling variation, context, measurement choices, statistical power, or ordinary research limitations. Fraud is a separate issue and requires different evidence.

Are all areas of psychology affected equally?

No. Concerns have been especially visible in some areas of social psychology and experimental behavioral research. Other areas have different evidence bases, including repeated measurement, large datasets, clinical trials, longitudinal studies, or well-established psychometric methods.

What does it mean if an effect replicates but is smaller?

It usually means confidence in the original effect size should be adjusted. The finding may still be meaningful, but its practical importance, theoretical interpretation, or policy relevance may need revision.

Has psychology improved because of the replication crisis?

Many researchers argue that the field has adopted stronger practices since the crisis became widely discussed. These include preregistration, larger samples, open data, registered reports, multi-lab collaborations, and clearer separation between exploratory and confirmatory analysis.

References

Journal Article

OSC2015 Estimating the reproducibility of psychological science Science Major replication project involving 100 psychology studies, central to the replication-crisis discussion.

ManyLabs2 Many Labs 2: Investigating variation in replicability across samples and settings Advances in Methods and Practices in Psychological Science Large multi-lab replication effort examining how effects vary across populations and research sites.

Camerer2018 Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 Nature Human Behaviour Replication assessment of high-profile social science experiments published in leading journals.

Nosek2018 The preregistration revolution Proceedings of the National Academy of Sciences Explains preregistration and why it became a major response to replication concerns.

Review

NelsonSimmonsSimonsohn2018 Psychology’s Renaissance Annual Review of Psychology Review of methodological reforms in psychology, including preregistration, transparency, and improved statistical practice.

Organization

OpenScienceFramework Open Science Framework Center for Open Science Platform used for preregistration, data sharing, and replication materials in many psychology projects.

Publisher Resource

APSReplication Registered Replication Reports Association for Psychological Science Collection of coordinated replication reports that illustrate reforms in replication methodology.

What each model said

Grok 4.3 Success

No, the replication crisis has not invalidated most landmark psychology findings as a whole, though it has substantially weakened many effects, exposed questionable practices, and prompted lasting reforms. Larg...

x-ai/grok-4.3-20260430 29 May 2026 10:24 stop

OpenAI GPT-5.4 Success

Probably not. The strongest defensible judgment is that the replication crisis seriously weakened confidence in many published psychology findings, including some famous ones, but it did not invalidate “most la...

openai/gpt-5.4-20260305 29 May 2026 10:24 stop

Mistral Medium 3.5 Success

The replication crisis has not invalidated *most* landmark psychology findings, but it has significantly undermined confidence in a substantial portion of them. Large-scale replication projects, such as the Ope...

mistralai/mistral-medium-3.5-20260430 29 May 2026 10:24 stop

Llama 4 Maverick Success

The replication crisis has raised concerns about the validity of some landmark psychology findings, but it has not invalidated most of them across psychology as a whole. Confidence: 80% The evidence is mixed, w...

meta-llama/llama-4-maverick-17b-128e-instruct 29 May 2026 10:25 stop

Claude Opus 4.7 Divergent view

No, the replication crisis has not invalidated "most" landmark psychology findings, though it has substantially weakened confidence in a significant subset of them and exposed serious methodological problems across the field. The accurate characterization is more nuanced: replication rates in major projects have been concerning but not catastrophic, and the impact has been uneven across subfields. Confidence: Moderately high (around 80%) on the overall judgment that "most landmark findings invalidated" overstates t...

anthropic/claude-4.7-opus-20260416 29 May 2026 10:25 stop

Gemini 3.1 Pro Incomplete

Incomplete response: output limit reached before the answer finished.

google/gemini-3.1-pro-preview-20260219 29 May 2026 10:25 length

DeepSeek V4 Pro Success

No, the replication crisis has not invalidated most landmark psychology findings. While large-scale replication efforts have revealed that many published effects are weaker, more context-dependent, or less stat...

deepseek/deepseek-v4-pro-20260423 29 May 2026 10:25 stop

GLM 5.1 Success

The claim that the replication crisis has invalidated most landmark psychology findings is overstated and misleading. Confidence: High Key evidence: - The 2015 Open Science Collaboration study in Science found...

z-ai/glm-5.1-20260406 29 May 2026 10:25 stop

Kimi K2.6 Incomplete

Incomplete response: output limit reached before the answer finished.

moonshotai/kimi-k2.6-20260420 29 May 2026 10:25 length

Qwen 3.7 Max Success

No, the replication crisis has not invalidated most landmark psychology findings across the discipline as a whole, though it has overturned, weakened, or heavily qualified a substantial and highly visible subse...

qwen/qwen3.7-max-20260520 29 May 2026 10:25 stop

Continue browsing

Latest in Science & research

See all

May 30, 2026 · 10 models · 7 sources

Per-model verdict

Grok 4.3 70%

OpenAI GPT-5.4 70%

Mistral Medium 3.5 75%

Llama 4 Maverick 80%

Claude Opus 4.7 60%

DeepSeek V4 Pro 70%

GLM 5.1 85%

Qwen 3.7 Max 85%

8/10 agreed 25% spread

Read model conclusions

Coverage map

N. America Europe Asia S. America Africa Oceania

3 continents contributed models — provided fair regional balance to this review. Grey regions have no suitable OpenRouter-regional model participants.

Confidence cluster

0%50%100%