Home› Browse› Science & research› Is implicit bias measured by t...

Contested claim · Science & research · §0220

Is implicit bias measured by the IAT predictive of discriminatory behavior?

The current research record suggests that IAT scores are, at most, weak and context-dependent predictors of discriminatory behavior. Evidence is stronger that the IAT captures some form of automatic association than that it reliably forecasts how a specific person will act.

Reviewed by 10 models · 3 countries 6 curated references 23 revisions Updated 19 hours ago 5 min read

Panel verdict

7/10 agreement 76% confidence 15% spread 29 May 2026 filed

7 reviewing models concluded the claim is mixed by the available evidence.

The Adjudged panel has not yet completed its review of this claim. This draft summarizes likely lines of evidence, major disagreements in the literature, and the kinds of findings that would be needed for a final assessment.

Panel synthesis

Consensus & disagreement

Where the panel agreed

8 of 10 modelsThe claim asks whether implicit bias as measured by the Implicit Association Test, or IAT, predicts discriminatory behavior. The IAT is a reaction-time task that compares how quick...

8 of 10 modelsMeta-analyses generally report that IAT scores have small associations with behavioral measures. Some studies find relationships between IAT scores and outcomes such as seating dis...

8 of 10 modelsUncertainty remains because studies use different IAT versions, different behavioral outcomes, and different statistical models. Some outcomes called discriminatory behavior are br...

Where the panel diverged

1 model notedGLM 5.1 gave the lowest confidence, while still reaching the same overall direction.

Why this question matters

The claim being judged

The claim asks whether implicit bias as measured by the Implicit Association Test, or IAT, predicts discriminatory behavior. The IAT is a reaction-time task that compares how quickly people sort concepts, such as social groups and positive or negative words, when those concepts are paired in different ways.

A narrow version of the claim is that an individual person's IAT score can be used to meaningfully forecast whether that person will discriminate in hiring, policing, medical treatment, classroom evaluation, or other real-world settings. A broader version is that average IAT scores across groups, places, or institutions correlate with disparities in behavior.

These versions are important to separate. A measure can show a small statistical association across many people while still being too noisy to predict the actions of any one person. It can also relate to some laboratory behaviors while offering less information about complex real-world decisions.

What the evidence shows

Meta-analyses generally report that IAT scores have small associations with behavioral measures. Some studies find relationships between IAT scores and outcomes such as seating distance, résumé ratings, clinical recommendations, or other judgment tasks, but the effect sizes are often modest and vary across domains.

Several reviews have raised concerns about whether the IAT adds much predictive value beyond explicit attitudes, demographics, social context, or other variables. Test-retest reliability is also imperfect, meaning a person's score can shift across time and testing conditions. That limits the usefulness of the IAT as a stand-alone individual risk assessment tool.

Evidence appears somewhat more favorable when IAT scores are analyzed in aggregate, such as comparing average regional implicit associations with group-level disparities. Even there, interpretation is difficult because many social, institutional, and historical factors can produce the same patterns.

A cautious reading is that the IAT may capture psychological associations that sometimes relate to behavior, especially in constrained research settings, but the available evidence does not support treating the IAT as a strong or reliable predictor of discriminatory conduct by individuals.

Where uncertainty remains

Uncertainty remains because studies use different IAT versions, different behavioral outcomes, and different statistical models. Some outcomes called discriminatory behavior are brief laboratory tasks, while others are closer to consequential real-world decisions. Combining these outcomes can obscure where prediction is stronger or weaker.

There is also disagreement about what standard the IAT should be held to. Supporters may argue that even small predictive relationships matter at population scale. Critics may argue that weak individual-level prediction, limited stability, and unclear incremental value make it unsuitable for many applied uses.

Future evidence would be most useful if it preregistered behavioral outcomes, used large and diverse samples, tested whether IAT scores add predictive value beyond explicit measures, and followed participants in realistic decision settings.

The three parts of the claim

The umbrella claim is actually several claims bundled into one. Each needs its own evaluation.

PART 1 / 3

An individual's IAT score reliably predicts whether that person will engage in discriminatory behavior.

Not supported78%

PART 2 / 3

IAT scores show small statistical associations with some behavioral measures in research studies.

Mixed70%

PART 3 / 3

The IAT consistently improves prediction of discriminatory behavior beyond explicit attitudes and contextual factors.

Not supported72%

Model comparison

How each panel model rated the three parts of the claim

Model	Part 1	Part 2	Part 3	Overall
Grok 4.3	No · 78%	Mixed · 70%	No · 72%	Mixed · 70%
Mistral Medium 3.5	No · 78%	Mixed · 70%	No · 72%	Mixed · 85%
Gemini 3.1 Pro	—	—	—	Incomplete
Llama 4 Maverick	No · 78%	Mixed · 70%	No · 72%	Mixed · 70%
OpenAI GPT-5.4	No · 78%	Mixed · 70%	No · 72%	Mixed · 70%
GLM 5.1	No · 78%	Mixed · 70%	No · 72%	Mixed · 75%
Claude Opus 4.7	No · 78%	Mixed · 70%	No · 72%	Mixed · 85%
DeepSeek V4 Pro	No · 78%	Mixed · 70%	No · 72%	Mixed · 70%
Qwen 3.7 Max	No · 78%	Mixed · 70%	No · 72%	No · 85%
Kimi K2.6	—	—	—	Incomplete

An honest commitment

What would change our mind

The current evidence leans one way. But we're not committed to the conclusion, we're committed to the evidence.

Large preregistered studies showing that IAT scores strongly predict consequential discriminatory behavior at the individual level.
Evidence that IAT scores add substantial predictive value beyond explicit attitudes, demographics, prior behavior, and situational variables.
High test-retest reliability across time and settings for the same IAT measures used to predict behavior.
Field studies connecting IAT scores to real decisions in hiring, medicine, education, housing, or policing with transparent outcome definitions.
Consistent results across independent research teams, populations, and behavioral domains.

Common questions

Does a high IAT score mean someone will discriminate?

Not by itself. The evidence suggests that individual IAT scores are too noisy and context-dependent to forecast a specific person's conduct with high reliability. They should not be treated as a stand-alone diagnosis of future behavior.

Does this mean the IAT measures nothing?

No. Many researchers view the IAT as measuring automatic associations or relative response patterns under time pressure. The disputed point is how much those scores predict consequential discriminatory actions.

Can small effects still matter?

Yes. A small statistical association can matter across large populations or repeated decisions. But that is different from saying the test can reliably identify which individuals will discriminate.

Is explicit bias more useful than the IAT?

In some studies, explicit attitudes predict behavior as well as or better than IAT scores. The key question is whether the IAT adds meaningful predictive value beyond what is already known from explicit attitudes, incentives, norms, and institutional context.

References

Meta Analysis

GREENWALD_2009 Understanding and Using the Implicit Association Test: III. Meta-Analysis of Predictive Validity Journal of Personality and Social Psychology Often cited as evidence that IAT scores have measurable but generally modest relationships with behavior.

OSWALD_2013 Predicting Ethnic and Racial Discrimination: A Meta-Analysis of IAT Criterion Studies Journal of Personality and Social Psychology Critical review focusing on how well IAT scores predict discriminatory outcomes.

FORSCHER_2019 A Meta-Analysis of Procedures to Change Implicit Measures Journal of Personality and Social Psychology Relevant to whether changes in implicit measures translate into changes in explicit measures or behavior.

KURDI_2019 Relationship Between the Implicit Association Test and Intergroup Behavior: A Meta-Analysis American Psychologist Provides a broad synthesis of IAT-behavior relationships across intergroup domains.

Critical Review

BLANTON_2009 Strong Claims and Weak Evidence: Reassessing the Predictive Validity of the IAT Journal of Applied Psychology Raises methodological concerns about interpreting IAT scores as predictors of behavior.

Review

NOSEK_2007 Pervasiveness and Correlates of Implicit Attitudes and Stereotypes European Review of Social Psychology Useful background on what IAT measures capture and how they relate to explicit attitudes.

What each model said

Grok 4.3 Success

The Implicit Association Test shows only weak, inconsistent correlations with discriminatory behavior in meta-analyses of criterion studies, typically with small effect sizes that vary sharply by context and do...

x-ai/grok-4.3-20260430 29 May 2026 10:25 stop

Mistral Medium 3.5 Success

The Implicit Association Test (IAT) does not reliably predict discriminatory behavior at the individual level. While meta-analyses (e.g., *Understanding and Using the Implicit Association Test: III*) show small...

mistralai/mistral-medium-3.5-20260430 29 May 2026 10:25 stop

Gemini 3.1 Pro Incomplete

Response completed normally.

google/gemini-3.1-pro-preview-20260219 29 May 2026 10:25 unknown

Llama 4 Maverick Success

The Implicit Association Test (IAT) does not reliably predict discriminatory behavior at an individual level, though it shows some statistical association with certain behaviors in research studies. Confidence...

meta-llama/llama-4-maverick-17b-128e-instruct 29 May 2026 10:25 stop

OpenAI GPT-5.4 Success

Mostly true. The best-supported reading of the claim is that IAT scores have, on average, only weak predictive value for discriminatory or intergroup-related behavior, and that the test is better established as...

openai/gpt-5.4-20260305 29 May 2026 10:25 stop

GLM 5.1 Divergent view

The IAT shows only weak and context-dependent associations with discriminatory behavior, and does not reliably predict whether a specific individual will discriminate. Confidence: Moderate to high Key evidence: - Multiple meta-analyses (including those by Greenwald et al. and Oswald et al.) consistently find small correlations between IAT scores and discriminatory behavior measures, typically in the range of r = .10 to .20 - The Oswald et al. meta-analysis specifically found that IAT scores added minimal predictive...

z-ai/glm-5.1-20260406 29 May 2026 10:25 stop

Claude Opus 4.7 Success

The Implicit Association Test (IAT) is, at best, a weak and inconsistent predictor of discriminatory behavior at the individual level. While it reliably measures something — typically described as automatic or...

anthropic/claude-4.7-opus-20260416 29 May 2026 10:25 stop

DeepSeek V4 Pro Success

Based on multiple meta-analyses, the Implicit Association Test does not reliably predict discriminatory behavior at the individual level, and its overall predictive validity is weak and context-dependent. While...

deepseek/deepseek-v4-pro-20260423 29 May 2026 10:26 stop

Qwen 3.7 Max Success

The Implicit Association Test (IAT) is a weak and unreliable predictor of individual discriminatory behavior, though it demonstrates small, statistically significant associations with behavioral measures at the...

qwen/qwen3.7-max-20260520 29 May 2026 10:26 stop

Kimi K2.6 Incomplete

Incomplete response: output limit reached before a parseable answer.

moonshotai/kimi-k2.6-20260420 29 May 2026 10:26 length

Continue browsing

Latest in Science & research

See all

May 30, 2026 · 10 models · 7 sources

Per-model verdict

Grok 4.3 70%

Mistral Medium 3.5 85%

Llama 4 Maverick 70%

OpenAI GPT-5.4 70%

GLM 5.1 75%

Claude Opus 4.7 85%

DeepSeek V4 Pro 70%

Qwen 3.7 Max 85%

7/10 rejected 15% spread

Read model conclusions

Coverage map

N. America Europe Asia S. America Africa Oceania

3 continents contributed models — provided fair regional balance to this review. Grey regions have no suitable OpenRouter-regional model participants.

Confidence cluster

0%50%100%