Psychology's Replication Crisis: Update

For the past decade, the field of social psychology has faced a reckoning known as the “replication crisis.” It began with a simple question: If we run the same famous experiments again, will we get the same results? The answer, alarming to many, was often “no.” Recent massive projects and updated methodologies are now quantifying exactly how deep this issue runs and, more importantly, how scientists are fixing it.

The Reproducibility Project: By the Numbers

The catalyst for this movement was the Reproducibility Project: Psychology, led by Brian Nosek and the Center for Open Science. This massive undertaking involved 270 researchers attempting to replicate 100 experimental and correlational studies published in three top-tier psychology journals.

The results, published in Science, were sobering:

Original Success: 97% of the original studies claimed statistically significant results.
Replication Reality: Only 36% of the replications yielded significant findings.
Effect Size: Even when results held up, the strength of the effect was roughly half of what was originally reported.

This wasn’t just a few minor errors. It suggested that a large portion of published psychological literature might be built on false positives, statistical noise, or flawed methodology.

Famous Findings That Failed to Replicate

To understand the impact, you have to look at the specific, high-profile theories that have crumbled or been severely weakened under scrutiny. These are concepts that made their way into textbooks, TED Talks, and popular self-help advice.

Ego Depletion

For years, the theory of “ego depletion” was dominant. It suggested that willpower is like a muscle that gets tired; if you resist a cookie now, you will have less self-control to solve a puzzle later. In 2016, a massive registered replication report involving 23 laboratories worldwide attempted to reproduce this effect. The result was a null finding. There was essentially zero evidence that the ego depletion effect existed in the way it was famously described.

The Facial Feedback Hypothesis

This classic textbook study claimed that if you hold a pen between your teeth (forcing a smile), you will find cartoons funnier than if you hold it with your lips (forcing a pout). In 2016, 17 different labs attempted to replicate this standard psychology experiment. The combined result from nearly 1,900 participants failed to replicate the original finding.

Power Posing

popularized by a viral TED Talk, the idea was that standing in a “high-power” pose (like Wonder Woman) for two minutes would change your hormone levels and make you more risk-tolerant. While the feeling of confidence might be real, subsequent larger studies failed to find the physiological changes (hormone shifts) or behavioral changes originally claimed.

The "Many Labs 2" Project

Following the initial shock of 2015, researchers launched “Many Labs 2” to see if the failures were due to differences in culture or demographics. Perhaps a study done on American college students just didn’t work on Dutch adults?

Many Labs 2 involved 60 labs across 36 nations and territories, testing a total of 15,000 participants. They attempted to replicate 28 classic and contemporary findings.

The Outcome: Only 14 of the 28 studies (50%) were successfully replicated.
The Insight: The failure to replicate was rarely because of the sample population. If a study worked, it usually worked everywhere. If it failed, it failed everywhere. This proved the issue was with the original science, not the cultural context.

The Culprit: P-Hacking and Publication Bias

Why did this happen? The update on the crisis points to systemic incentives rather than malicious fraud.

P-Hacking: This involves manipulating data analysis until a pattern emerges. For example, a researcher might measure 20 different variables but only report the two that show a “significant” link, ignoring the 18 that didn’t.

Publication Bias: Journals prefer to publish “new” and “exciting” discoveries. A study that says “we found nothing” rarely gets published. This creates a “file drawer problem” where negative results are hidden away, leaving the public to see only the statistical anomalies that look like breakthroughs.

The Reform: How Science is Fixing Itself

The term “crisis” is slowly being replaced by “revolution.” The field is adopting rigorous new standards to ensure future research is solid.

Pre-Registration

This is the most significant change. Before collecting a single piece of data, scientists now register their hypothesis and analysis plan on platforms like the Open Science Framework (OSF). This prevents them from changing the rules halfway through the game to get a positive result.

Registered Reports

Some journals have changed their publishing model. They now review the study’s design before the experiment is run. If the method is solid, the journal commits to publishing the results regardless of whether they are positive or negative. This eliminates the pressure to find a “sexy” result.

Larger Sample Sizes

The era of drawing broad conclusions from 20 undergraduates is ending. Modern standards demand hundreds, sometimes thousands, of participants to ensure that statistical findings are not just random noise.

Frequently Asked Questions

What does “p-value” mean in this context?

The p-value is a statistical measure used to determine if a result is significant. Traditionally, if a p-value is less than 0.05, the result is considered significant. However, without pre-registration, it is easy to manipulate data to get below this threshold, which contributed to the false positives in the replication crisis.

Does this mean all psychology research is wrong?

No. Many robust findings in psychology remain valid, particularly in personality psychology and cognitive psychology. The crisis has mostly affected social psychology (how people interact) and specific “priming” studies. The goal of the current movement is to weed out the weak science to leave the strong science standing.

How can I tell if a psychology study is trustworthy?

Look for “pre-registered” badges on the study. Check if the sample size is large (hundreds of people is better than dozens). Also, be skeptical of “sensational” claims that suggest a small intervention (like a power pose) causes a massive life change.

Is the crisis over?

Not yet, but the corner has been turned. While older studies are still being re-evaluated, new research coming out today is generally subjected to much higher standards of transparency and rigor than research published ten years ago.