Increasingly powerful computers using ever-more sophisticated programs are challenging human supremacy in areas as diverse as playing chess and making emotionally compelling music. But can digital diagnosticians match, or even outperform, human physicians?
The answer, according to a new study led by researchers at Harvard Medical School, is “not quite.”
The findings, published Oct. 10 in JAMA Internal Medicine, show that physicians’ performance is vastly superior and that doctors make a correct diagnosis more than twice as often as 23 commonly used symptom-checker apps. The analysis is believed to provide the first direct comparison between human-made and computer-based diagnoses.
Diagnostic errors stem from failure to recognize a disease or to do so in a timely manner. Physicians make such errors roughly 10 to 15 percent of the time, researchers say.
Over the last two decades, computer-based checklists and other fail-safe digital apps have been increasingly used to reduce medication errors or streamline infection-prevention protocols. Lately, experts have wondered whether computers might also help improve clinical diagnoses and reduce diagnostic errors. Each year, hundreds of millions of people use Internet programs or apps to check their symptoms or to self-diagnose. Yet how these computerized symptom-checkers fare against physicians has not been well studied.
In the study, 234 internal medicine physicians were asked to evaluate 45 clinical cases, involving both common and uncommon conditions with varying degrees of severity. For each scenario, physicians had to identify the most likely diagnosis along with two additional possible diagnoses. Each clinical vignette was solved by at least 20 physicians.
The physicians outperformed the symptom-checker apps, listing the correct diagnosis first 72 percent of the time, compared with 34 percent of the time for the digital platforms. Eighty-four percent of clinicians listed the correct diagnosis in the top three possibilities, compared with 51 percent for the digital symptom-checkers.
The difference between physician and computer performance was most dramatic in more severe and less common conditions. It was smaller for less acute and more common illnesses.
“While the computer programs were clearly inferior to physicians in terms of diagnostic accuracy, it will be critical to study future generations of computer programs that may be more accurate,” said senior investigator Ateev Mehrotra, an associate professor of health care policy at HMS.
Despite outperforming the machines, physicians still made errors in about 15 percent of cases. Researchers say developing computer-based algorithms to be used in conjunction with human decision-making may help further reduce diagnostic errors.
“Clinical diagnosis is currently as much art as it is science, but there is great promise for technology to help augment clinical diagnoses,” Mehrotra said. “That is the true value proposition of these tools.”
Co-investigators included Hannah Semigran, former research assistant in the HMS Department of Health Care Policy; David Levine, HMS research fellow in medicine at Brigham and Women’s Hospital; and Shantanu Nundy, an employee of the Human Diagnosis Project, the creators of Human Dx.