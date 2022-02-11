On Sept. 14, 1986, first lady Nancy Reagan appeared on national television to address the nation from the West Hall of the White House. She sat on a sofa next to her husband, President Ronald Reagan, and gazed into the camera.
“Today there’s a drug and alcohol abuse epidemic in this country and no one is safe from it,” she said. “Not you, not me, and certainly not our children.”
This broadcast was the culmination of the last five years the first lady had spent traveling the country to raise awareness among American youth about the dangers of drug use.
She had become the public face of the preventative side of her husband Ronald Reagans ’s War on Drugs, and her message hinged on a catchphrase that millions of people still remember, which she employed once again that evening on television.
“Not long ago, in Oakland, California,” Nancy Reagan told viewers,” I was asked by a group of children what to do if they were offered drugs. And I answered, ‘Just say no.’”
That program eventually became what government and law enforcement officials saw as the crown jewel of the Just Say No campaign: Drug Abuse Resistance Education, or D.A.R.E. I can still remember vividly when they visited my classroom at Sun Prairie High School in 1987.
There was only one problem: D.A.R.E. didn’t actually work.
In the decades since Nancy Reagan addressed the nation about combating drug use from the West Hall of the White House, numerous studies have demonstrated that D.A.R.E. in fact didn’t effectively lead kids to just say no.
It is hard to overstate the impact of D.A.R.E.’s voltage drop at scale. For years the program leveraged the time and effort of thousands of teachers and law enforcement officers who were deeply invested in the wellbeing of our greatest natural resource: future generations. Y
et all of this hard work, never mind money, was wasted on scaling D.A.R.E. because of a fundamentally erroneous premise — that the program reduced drug use in children.
As a result, it took away support and resources from other initiatives that might have actually had the intended effect of winning the so-called drug war through prevention efforts. Why D.A.R.E. became the scaling disaster that it did reveals the first pitfall every enterprise hoping to grow must avoid.
A false positive: the data were lying.
In medicine, a false positive is a testing error indicating that a person has a condition which they don’t in fact have. This term gained more attention than perhaps ever before during the COVID-19 crisis, as some test results for the virus turned out to be unreliable and showed people had contracted the virus when in reality they had not.
A diagnostic false positive of this sort can arise for a variety of reasons, from a mechanical malfunction (the testing device fails to operate properly) to human error (the person administrating the test makes a mistake).
In either case, a false positive is bad because it leads to misinformed decisions with downstream consequences. The same goes for false positives on programs when scaling. They set you up not for just future failures, but big ones.
In the science of scaling, a false positive is quite simply the data telling a lie: a seemingly good result at an early stage of an endeavor that turns out to be untrue.
Disaster ensues—along with squandered time and resources—when the error isn’t detected early on, causing enterprises that were never actually successful to begin with to suffer an inevitable voltage drop at scale. In other words, eventually the truth will come out.
This was what happened with D.A.R.E. The National Institute of Justice’s 1986 shining assessment of the program was a false positive. Its seemingly strong positive results drove schools, police departments, and the federal government to just say yes to expanding D.A.R.E. nationwide when in reality it had never had a chance at fulfilling its important mission.
Think of it this way: A rocket ship without working thrusters is never going to blast off. Likewise, an idea launched with a false premise won’t make a difference, like D.A.R.E.
Failures of this sort are alarmingly common among public policy initiatives that rely on scaling up promising pilot studies.
However, false positives don’t only plague government, academia, and the non-profit sector. Recent estimates from the academia estimate that between 50 to 90 percent of results will lose voltage at scale.
I have witnessed failures of this sort firsthand in my own work in the White House and the business world. What can be done to combat false positives?
Some years ago, I began a project to develop a program to improve employee health and productivity at Chrysler. We called this pilot study “ANewHealthyLife” and conducted it at one of Chrysler’s 31 factories.
The initiative used financial incentives—we paid employees to engage in healthy activities—and the initial results were promising.
People in our wellness program engaged in more healthy behaviors, had lower medical expenditures, and were absent at work less compared to employees who did not participate in the program.
In short, it was a success.
Our experiment appeared to have saved Chrysler a lot of money in a fairly short amount of time. The CEO at the time, Tom LaSorda, was impressed enough to commit resources for an expansion of the program, optimistic that this could be a game-changer worth expanding across the remaining 30 plants.
Although my team and I were pleased with our results, we were more cautious. During my many years of conducting fieldwork and reviewing the research of others, I had seen my share of false positives.
We argued that, since the evidence was just from one plant, we should make sure that the program worked elsewhere before fully rolling it out broadly.
Given that Chrysler was in cost-cutting mode anyway, the company agreed and we expanded the wellness program more slowly, to two other plants. This time our intervention produced less than thrilling results.
At both plants, the employees who participated in the health initiative and those who didn’t performed similarly across all of the main outcome metrics important to Chrysler: absenteeism, presenteeism, healthcare costs, etcetera.
Uh oh.
It seemed the initial good results were a statistical blip—a false positive. Just to make sure, we ran the program again at two additional plants; twice more, the results came back negative.
The wellness program was just not what the trial-run data had suggested. LaSorda was understandably disappointed, though not as disappointed as he would have been had Chrysler paid to scale up our intervention across all 31 of the company’s plants. This would have put Chrysler in an even worse spot than when LaSorda had first reached out to us.
We ended up pivoting and creating a program that did have true voltage.
Moral of the story? One swallow doesn’t make a summer! Be sure to scale programs that actually have voltage, as government bodies we owe that to our citizens — and as firms, we owe that to our stakeholders.
Sun Prairie High School alum John A. List (@Econ_4_Everyone) is the Kenneth C. Griffin Distinguished Service Professor of Economics at the University of Chicago, Chief Economist at Lyft (formerly in the same role at Uber). This essay is adapted from his book, “The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale,” published Feb. 1 by Currency.