Paradigm Shift in Social Science: The Heterogeneity Revolution

A new essay by Elizabeth Tipton, Christopher Bryan and David Yeager describes a paradigm shift within the social sciences that is underway that they call the heterogeneity revolution.

The Replication Crisis in the Social Sciences

Over the past decades, psychology and other social sciences have been looking for insights and interventions that can be used to help solve all kinds of social and psychological problems and to support human well-being. 
However, over the last 10 years, the credibility of the social sciences has been hurt by the so-called replication crisis. The word "replication crisis" refers to the phenomenon that many effects initially found in studies were later not found, or to a lesser extent, in replication studies which often used much larger samples.

This apparent poor replicability of effects is generally seen as a major problem because many of the effects that had been reported previously were now questioned as to whether they actually existed and whether they were therefore relevant. The lack of replicability has been blamed on both inadequate research practices and deliberate manipulation by researchers and institutes.

Paradigm Change: Heterogeneity of treatment effects

Tipton et al. argue that this focus on whether the effects are real is far too limited and distracts from much more important questions. They argue for a paradigm shift in social science research which they say has already begun. This change revolves around the importance of heterogeneity of treatment effects.

They explain that interventions rarely work for everyone and in all circumstances. In general, it is therefore not very useful to administer an intervention to a random group and see whether it works. Usually, there is also little point in administering an intervention from a gigantic sample and looking at the average effect of the intervention. The authors explain this on the basis of the picture below.

In the four pictures, a point cloud has been drawn with each point representing one person. The Y-axis represents the strength of the effect, the X-axis represents a moderator. A moderator is a variable that affects the strength of the relation between treatment and effect. Moderator variables could be refer to something in the person, something in the situation, or something in the way the intervention was applied.

Four scenarios

In picture A you can see that the original experiment showed a significant average effect. Picture B shows that a larger sample was used and that the average effect found is smaller. Picture C also works with a larger sample consisting of other types of test subjects and on average finds no effect at all. Picture D uses a sample that is representative of the entire population and finds only a very small mean effect.

Thinking in the traditional way, researchers would say that the effect originally found is apparently not a true effect and is therefor not relevant to real world situations (for example, Sisk et al., 2018 say something like this about a growth mindset intervention based on two meta-analyzes they performed).

But arguing from the heterogeneity paradigm you arrive at a different and more realistic view. The differences in effect size have to do with the moderator. For some of the individuals the intervention works very well, for others it does not work. Or: the intervention works well in one context, but not in another. Or: it works well if administered in one way, but not in another way.


We must get used to the idea that interventions are generally not intended for everyone and all situations. In many cases, it is not reasonable to think that everyone should benefit from a psychological intervention. This is actually common sense. We also don't think it would be good if everyone started taking a blood pressure medication from now on, do we?

An example of a study that does justice to the insights of heterogeneity thinking is the NSLM study. This study found an overall effect for growth mindset interventions. But more interestingly, the study found greater effects for specific groups of students and specific contexts. Armed with this kind of knowledge, targeted groups of individuals can receive the appropriate interventions and efforts can be made to change contexts when these impede the effectiveness of interventions.
We must get rid of the belief that failed replications necessarily indicate incompetence or dishonesty on the part of researchers or institutions. We stop obsessing about main effects and making samples as large as possible. Much more emphasis should be placed on discovering relevant moderators. In other words: which variables within the person, the context and the intervention determine the effectiveness?

To meet the demands of the new paradigm, social science must become more of a team-based activity. Research will undoubtedly become more challenging and large-scale (see the NSLM study). But this doesn't mean it can't be done. The social sciences could draw inspiration from physicists who have joined forces to build the particle accelerator in CERN.


  1. I'm surprised that it has not always been self-evident that an intervention has different effects on different people. If you took 5000 people and put their arm in a cast for a month and then measured whether their arm was better after the intervention you'd get a moderate negative outcome. If you took 100 people with broken arms and put them in a cast, you'd get a large positive outcome.

    So am I wrong that this heterodox issue was self-evident?
    Secondly, is the reason we ignored this was that it was impractical to do the kind of research that looked at many factors in large population while assessing an intervention, and hence just hoped our overly simply research would work?

    1. Hi David, I agree. It probably should have been more obvious than it apparently was. I think that the reason you mention is part of the explanation. It is much easier, more practical, to use convenience samples and simple designs than the more specific and complex designs which are required from a heterogenity perspective. In addition to that, I think that we may tend to (sincerely) believe in universal psychology a bit too strongly. While there are some universals in psychology, there is also a lot of differentiation, depending on differences in people and contexts.


Post a Comment