To Fix the Social Sciences, Look to the “Dark Ages” of Medicine

If medicine could break with its barbarous past, why shouldn’t the same path be open to the social sciences?

The analogies between 19th-century medicine and 21st-century social sciences are compelling. Image: “Ignaz Philipp Semmelweis. Photograph after a frieze in the Social Hygiene Museum, Budapest.” Source: Wellcome Library

By: Lee McIntyre

In 1847, Ignaz Semmelweis, then a lowly assistant physician in the world’s largest maternity clinic, made a discovery that would save the lives of countless women. When one of his colleagues received a puncture wound during an autopsy on a woman with childbed fever — a fever caused by postpartum infection — and died of an illness that presented the same symptoms, he had an aha moment. Semmelweis realized that medical students who came directly to the maternity ward after performing autopsies were probably transferring “cadaveric matter” to pregnant women. This was, after all, before antisepsis and the germ theory of disease, before routine handwashing and the sterilization of medical instruments. As a test, he ordered the students to wash their hands in chlorinated water before performing their deliveries. The mortality rate plummeted.

Looking back to the state of medicine at the time of Semmelweis — that is, the prescientific “dark ages” of the 19th century — the analogies with 21st-century social sciences (economics, psychology, sociology, anthropology, history, and political science) are compelling. Knowledge and procedures were based on folk wisdom, intuition, and custom. Experiments were few. When someone had a theory, it was thought to be enough to consider whether it “made sense,” even if there was no empirical evidence in its favor. Indeed, the very idea of attempting to gather evidence to test a theory flew in the face of the belief that medical practitioners already knew what was behind most illnesses. Despite the shocking ignorance and backward practices of medicine throughout most of its history, theories were abundant and ideas were rarely challenged or put to the test.

Today, too much social research remains embarrassingly unrigorous. Not only are the methods sometimes poor, much more damning is the non-empirical attitude that lies behind them. Many allegedly scientific studies on immigration, guns, the death penalty, and other important social topics are infected by their investigator’s political or ideological views, so that it is all but expected that some researchers will discover results that are squarely in line with liberal political beliefs, while others will produce conservative results that are directly opposed to them. A good example here is the question of whether immigrants “pay their own way” or are a “net drag” on the American economy. If this is truly an empirical question then why is the literature mixed? These are purportedly rigorous social scientific studies performed by well-respected scholars — it is just that their findings of factual matters flatly contradict one another. This would not be tolerated in physics, so why is it tolerated in sociology?

Today, too much social research remains embarrassingly unrigorous. Not only are the methods sometimes poor, much more damning is the non-empirical attitude that lies behind them.

The truth is that such questions are open to empirical study and it is possible for social science to study them scientifically. There are right and wrong answers to our questions about human behavior. Do humans experience a “backfire effect” when exposed to evidence that contradicts their opinion on an empirical (rather than a normative) question, such as whether there were weapons of mass destruction in Iraq or whether President George W. Bush proposed a complete ban on stem cell research? Is there such a thing as implicit bias, and if so, how can it be measured? Such questions can be, and have been, studied scientifically. Although social scientists may continue to disagree (and indeed, this is a healthy sign in ongoing research), their disagreements should focus on the best way to investigate these questions, not whether the answers produced are politically acceptable. Having the scientific attitude toward evidence — the willingness to change theories on the basis of new findings — is just as necessary in the study of human behavior as it is in the study of nature.

When so many studies fail to be replicated, or draw different conclusions from the same set of facts, it does not instill confidence in the social sciences. Whether this is because of sloppy methodology, ideological infection, or other problems, the result is that even if there are right and wrong answers to many of our questions about human action, most social scientists are not yet in a position to find them. It is not that none of the work in social science is rigorous enough, but when policymakers (and sometimes even other researchers) are not sure which results are reliable, it drives down the status of the entire field. If medicine could break with its barbarous past, isn’t the same path open to the social sciences?

For years, many have argued that if they could emulate the “scientific method” of the natural sciences, they too could become more scientific. But this simple advice faces several problems. Among the issues that plague contemporary social scientific research:

Too much theory: A number of social scientific studies propose answers that have not been tested against evidence. The classic example here is neoclassical economics, where a number of simplifying assumptions — perfect rationality, perfect information — resulted in beautiful quantitative models that had little to do with actual human behavior.
Lack of experimentation/data: Except for social psychology and the newly emerging field of behavioral economics, much of social science still does not rely on experimentation, even where it is possible. For example, it is sometimes offered as justification for putting sex offenders on a public database that doing so reduces the recidivism rate. This must be measured, though, against what the recidivism rate would have been absent the Sex Offender Registry Board (SORB), which is difficult to measure and has produced varying answers. This exacerbates the difficulty in (1), whereby favored theoretical explanations are accepted even when they have not been tested against any experimental evidence.
Fuzzy concepts: Some social scientific studies can lead to misleading conclusions because of the use of “proxy” concepts for what one really wishes to measure. A recent example includes measuring “warmth” as a proxy for “trustworthiness,” in which researchers assumed — on the basis of studies which show that we are more likely to trust someone whom we perceive to be “on our side” — that perceptions of scientists as “cold” meant that they would be less trustworthy as well. But the two concepts may not be interchangeable.
Ideological infection: This problem is rampant throughout the social sciences, especially on topics that are politically charged. Two ongoing examples are the bastardization of empirical work on the deterrence effect of capital punishment and the effectiveness of gun control on mitigating crime. If one knows in advance what one wants to find, one will likely find it.
Cherry picking: The use of statistics allows multiple “degrees of freedom” to scientific researchers, but this is the most likely to be abused. In studies on immigration, for instance, a great deal of the difference between them is a result of alternative ways of counting the “costs” incurred by immigration. This is obviously also related to (4) above. If we know our conclusion, we may shop for the data to support it.
Lack of data sharing: As the evolutionary biologist Robert Trivers reports in Psychology Today, there are numerous documented cases of researchers failing to share their data in psychological studies, despite a requirement from APA-sponsored journals to do so. When data were later analyzed, errors were found most commonly in the direction of the researcher’s hypothesis.
Lack of replication: Psychology is undergoing a reproducibility crisis. One might validly argue that the initial finding that nearly two-thirds of psychology studies were irreproducible was overblown, but it is nonetheless shocking that most studies are not even attempted to be replicated. This can lead to difficulties, where errors can sneak through.
Questionable causation: It is gospel in statistical research that “correlation does not equal causation,” yet some social scientific studies continue to highlight provocative results of questionable value. One recent sociological study, for instance, found that matriculating at a selective college was correlated with parental visitation at art museums, without explicitly suggesting that this was likely an artifact of parental income.

So what does an example of good social scientific work — one that is based firmly in the scientific attitude, uses empirical evidence to challenge an intuitive theoretical hypothesis, and employs experimental methods to measure human motivation directly through human action — look like? For this, we can look no further than Sheena Iyengar’s investigation of the paradox of choice. Here, we face a classic social scientific dilemma. How can something as amorphous as human motivation be measured through empirical evidence? According to neoclassical economics, we measure consumer desire directly through marketplace behavior. People will buy what they want, and the price is a reflection of how much the good is valued. To work out the mathematical details, however, a few “simplifying assumptions” are required.

First, we assume that our preferences are rational. If I like cherry pie more than apple pie, and apple more than blueberry, it is assumed that I like cherry more than blueberry. (For more on the classic assumptions that economists and others have made about human rationality, and how they break down in the face of experimental evidence, see Daniel Kahneman’s book “Thinking Fast and Slow.”) Second, we assume that consumers have perfect information about prices. Although this is widely known to be untrue in individual cases, it is a core assumption of neoclassical economics, for it is needed to explain how it is that the market as a whole performs the magical task of ordering preferences through prices. Although it is acknowledged that actual consumers may make “mistakes” in the marketplace (for instance, they did not know that cherry pie was on sale at a nearby market), the model purports to work because if they had known this, they would have changed their behavior. Finally, the neoclassical model assumes that “more is better.” This is not to say that there is no such thing as diminishing marginal utility — that last bite of cherry pie probably does not taste as good as the first one — but it is to say that for consumers it is better to have more choices in the marketplace, for this is how one’s preferences can be maximized.

“We choose not to choose,” explains social scientist Sheena Iyengar, “even when it goes against our best self-interests.”

In Sheena Iyengar’s work, she sought to test this last assumption directly through experiment. The stakes were high, for if she could show that this simplifying assumption was wrong, then, together with the economist Herbert Simon’s earlier work undermining “perfect information,” the neoclassical model may be in jeopardy. Iyengar and her colleague Mark Lepper set up a controlled consumer choice experiment in a grocery store where shoppers were offered the chance to taste different kinds of jam. In the control condition, shoppers were offered twenty-four different choices. In the experimental condition, this was decreased to six options. To ensure that different shoppers were present for the two conditions, the displays were rotated every two hours and other scientific controls were put in place. Iyengar and Lepper sought to measure two things: (1) how many different flavors of jam the shoppers chose to taste and (2) how much total jam they actually bought when they checked out of the store. To measure the latter, everyone who stopped by to taste was given a coded coupon, so that the experimenters could track whether the number of jams in the display affected later purchasing behavior. And did it ever. Even though the initial display of twenty-four jams attracted slightly more customer interest, their later purchasing behavior was quite low when measured against those who had visited the booth with only six jams. Although each display attracted an equal number of jam tasters (thus removing the fact of tasting as a causal variable to explain the difference), the shoppers who had visited the display with twenty-four jams used their coupons only three percent of the time, whereas those who visited the display with only six jams used theirs 30 percent of the time.

What might account for this? In their analysis, Iyengar and Lepper speculated that the shoppers might have been overwhelmed in the first condition. Even when they tasted a few jams, this was such a small percentage of the total display that they perhaps felt they could not be sure they had chosen the best one, so they chose not to buy any at all. In the second condition, however, shoppers might have been better able to rationalize making a choice based on a proportionally larger sampling. As it turned out, people wanted fewer choices. Although they might not have realized it, their own behavior revealed a surprising fact about human motivation.

Not only is Iyengar and Lepper’s study an example of good social science; its positive impact on human lives has been considerable.

Although this may sound like a trivial experiment, the implications are far-reaching. One of the most important direct applications of Iyengar and Lepper’s finding was to the problem of undersaving in 401k plans, where new employees are customarily overwhelmed by the number of options for investing their money and so choose to put off the decision, which effectively means choosing not to invest any money at all. Not only is this good social science, but its positive impact on human lives has been considerable.

For present purposes, the point is this: Even in a situation where we may feel most in touch with our subject matter — human preference and desire — we can be wrong about what influences our behavior. If you ask people whether they want more or fewer choices, most will say they want more. But their actual behavior belies this. The results of experimental evidence in the study of human action can surprise us. Even concepts as seemingly qualitative as desire, motivation, and human choice can be measured by experimentation rather than mere intuition, theory, or verbal report.

Here again we are reminded of Semmelweis. How do we know before we have conducted an experiment what is true? Our intuitions may feel solid, but an experiment shows that they can fail us. And this is as true in social science as it is in medicine. Having the facts about human behavior can be just as useful in public policy as in the diagnosis and treatment of human disease. Thus the scientific attitude is to be recommended just as heartily in social science as it is in any empirical subject. If we care about evidence and are willing to change our minds about a theory based on evidence, what better example might we have before us than the success of Iyengar and Lepper’s experiment? Just as the elegance of Louis Pasteur’s experimental model allowed him to overthrow the outdated idea of spontaneous generation, could economics now move forward owing to recognition of the impact of cognitive bias and irrationality on human choice?

Medicine was once held in low repute, but it broke out of its prescientific “dark ages” because of individual breakthroughs that became the standard for group practice and some degree of standardization of what counted as evidence. To date, the social sciences have yet to complete their evidence-based revolution. We can find some examples today of the scientific attitude at work in social inquiry that have enjoyed some success — Iyengar is not entirely alone — but there has not yet been a discipline-wide acceptance of the notion that the study of human behavior needs to be based on theories and explanations that are relentlessly tested against what we have learned through experiment and observation. As in prescientific medicine, too much of today’s social science relies on ideology, hunches, and intuition.

In its subject matter, medicine is in many ways like social science. We have irreducible values that will inevitably guide our inquiry: we value life over death, health over disease. We cannot even begin to embrace the “disinterested” pose of the scientist who does not care about his or her inquiry beyond finding the right answer. Medical scientists desperately hope that some theories will work because lives hang in the balance. But how do they deal with this? Not by throwing up their hands and admitting defeat, but rather by relying on good scientific practices like randomized double-blind clinical trials, peer review, and disclosure of conflicts of interest. The placebo effect is real, for both patients and their doctors. If we want a medicine to work, we might subtly influence the patient to think that it does. But whom would this serve? When dealing with factual matters, medical researchers realize that influencing their results through their own expectations is nearly as bad as fudging them. So they guard against the hubris of thinking that they already know the answer by instituting methodological safeguards. They protect what they care about by recognizing the danger of bias.

At some level we do not yet have enough respect for our own ignorance to keep ourselves honest by comparing our ideas relentlessly against the data.

The mere presence of values or caring about what you study does not undercut the possibility of science. We can still learn from experience, even if we are fully invested in hoping that one medicine will work or one theory is true, as long as we do not let this get in the way of good scientific practice. We can still have the scientific attitude, even in the presence of other values that may exist alongside it. Indeed, it is precisely because medical researchers and physicians recognize that they may be biased that they have instituted the sorts of practices that are consonant with the scientific attitude. They do not wish to stop caring about human life, they merely want to do better science so that they can promote health over disease. In fact, if we truly care about human outcomes, it is better to learn from experience, as the history of medicine so clearly demonstrates. It is only when we take steps to preserve our objectivity — instead of pretending that this is not necessary or that it is impossible — that we can do better science.

Like medicine, social science is subjective. And it is also normative. We have a stake not just in knowing how things are but also in using this knowledge to make things the way we think they should be. We study voting behavior in the interest of preserving democratic values. We study the relationship between inflation and unemployment in order to mitigate the next recession. Yet unlike medicine, so far social scientists have not proven to be very effective in finding a way to wall off positive inquiry from normative expectations, which leads to the problem that instead of acquiring objective knowledge we may only be indulging in confirmation bias and wishful thinking. This is the real barrier to a better social science. It is not just that we have ineffective tools or a recalcitrant subject matter; it is that at some level we do not yet have enough respect for our own ignorance to keep ourselves honest by comparing our ideas relentlessly against the data. The challenge in social science, then, is to find a way to preserve our values without letting them interfere with empirical investigation. To change the world, we need to first understand it.

Lee McIntyre is a Research Fellow at the Center for Philosophy and History of Science at Boston University. He is the author of “The Scientific Attitude,” “Dark Ages,” and “Post-Truth,” all published by the MIT Press.