ADVERTISEMENT

Big Data Won't Save You From Coronavirus

In an era where everything seems quantifiable, it’s unsettling that the info we’ve got on this outbreak is approximate at best.

Big Data Won't Save You From Coronavirus
Commuters wearing protective masks gather at a station platform in Bangkok, on Feb. 5. (Photographer: Andre Malerba/Bloomberg)

(Bloomberg Opinion) -- How often do you see a piece of economic or financial information revised upward by 45%? And how reliable would you regard a data set that’s subject to such adjustments?

This is the problem confronting epidemiologists trying to make sense of the novel coronavirus spreading from China’s Hubei province. On Thursday, the tally there surged by 45% — or 14,480 cases. The revision was largely due to health authorities adding patients diagnosed on the basis of lung scans to a previous count, which was mostly limited to those whose swab tests came back positive.

The medical data emerging from hospitals and clinics around the world are invaluable in determining how this outbreak will evolve — but the picture painted by the information is changing almost as fast as the disease itself, and isn’t always of impeccable provenance. Just as novel infections exploit weaknesses in the body’s immune defenses, epidemics have an unnerving habit of spotting the vulnerabilities of the data-driven society we’ve built for ourselves.

Big Data Won't Save You From Coronavirus

That’s not a comforting thought. We live in an era where everything seems quantifiable, from our daily movements to our internet search habits and even our heartbeats. At a time when people are scared and seeking certainty, it’s alarming that the knowledge we have on this most important issue is at best an approximate guide to what’s happening.

“It’s so easy these days to capture data on anything, but to make meaning of it is not easy at all,” said John Carlin, a professor at the University of Melbourne specializing in medical statistics and epidemiology. “There’s genuinely a lot of uncertainty, but that’s not what people want to know. They want to know it’s under control.”

That’s most visible in the contradictory information we’re seeing around how many people have been infected, and what share of them have died. While those figures are essential for getting a handle on the situation, as we’ve argued, they’re subject to errors in sampling and measurement that are compounded in high-pressure, strained circumstances. The physical capacity to do timely testing and diagnosis can’t be taken for granted either, as my colleague Max Nisen has written.

Early case fatality rates for Severe Acute Respiratory Syndrome were often 40% or higher before settling down to figures in the region of 15% or less. The age of patients, whether they get sick in the community or in a hospital, and doctors’ capacity and experience in offering treatment can all affect those numbers dramatically.

Even the way that coronavirus cases are defined and counted has changed several times, said Professor Raina MacIntyre, head of the University of New South Wales’s Biosecurity Research Program: From “pneumonia of unknown cause” in the early days, through laboratory-confirmed cases once a virus was identified, to the current standard that includes lung scans. That’s a common phenomenon during outbreaks, she said. 

Those problems are exacerbated by the fact that China’s government has already shown itself willing to suppress medical information for political reasons. While you’d hope the seriousness of the situation would have changed that instinct, the fact casts a shadow of doubt over everything we know.

How should the world respond amid this fog of uncertainty?

While every piece of information is subject to revision and the usual statistical rule of garbage-in, garbage-out, epidemiologists have ways to make better sense of what is going on. 

Well-established statistical techniques can be used to clean up messy data. A study this week by Imperial College London used screening of passengers flying to Japan and Germany to estimate the fatality rate for all cases was about 1% — below the 2.7% of confirmed ones found in Hubei province, but higher than the 0.5% seen for the rest of the world.

When studies from different researchers using varying techniques start to converge toward common conclusions, that’s also a strong if not faultless indication that we’re on the right track. The number of new infections caused by each coronavirus case has now been identified in the region of 2.2 or 2.3 by several separate  studies, for instance — although that number itself can be subject to change as people quarantine themselves and self-segregate to prevent infection.

The troubling truth, though, is that in a society that expects to know everything, this most crucial piece of knowledge is still uncertain.

Google can track my every move and tell me where I ate lunch last week, but viruses don’t carry phones. The facts about this disease are hidden in the activity of billions of nanometer-scale particles, spreading through the cells of tens of thousands of humans and the environments we traverse. Big data can barely scratch the surface of solving that problem.

To contact the editor responsible for this story: Rachel Rosenthal at rrosenthal21@bloomberg.net

This column does not necessarily reflect the opinion of Bloomberg LP and its owners.

David Fickling is a Bloomberg Opinion columnist covering commodities, as well as industrial and consumer companies. He has been a reporter for Bloomberg News, Dow Jones, the Wall Street Journal, the Financial Times and the Guardian.

©2020 Bloomberg L.P.