March 13, 2020

Why don’t we know the covid-19 mortality rate?

There are lots of questions about the current pandemic that need expertise in microbiology or international freight logistics or sociology or whatever, but there’s the occasional one that is basically statistical.  In particular,  lots of people would like to know how bad COVID-19 actually is: what’s your (or your kid, or your grandmother’s)  chance of needing hospital treatment or dying?  This post will try to explain why we don’t know the answer, and aren’t going to know the answer for a while, although there are some questions that sound similar where we do know the answer or will know fairly soon.

The mortality rate (case fatality rate) for a disease is the number of people who die from it divided by the number of people who catch it.  For the initial outbreak in China we have a reasonably good idea of the number of people who died (at least if you trust the PRC statistics), and the rest have recovered. We don’t know how many people were infected; the health system had more urgent things to do than testing apparently healthy people. The same is likely true for some of the smaller outbreaks in other Asian countries.  In the rest of the world we don’t even know the numerator of the rate, because most of the people who have been sick are still sick and we have to wait to see how many recover and how many die.

To some extent the mathematical epidemic models can work around this problem.  If people with few or no symptoms are still infectious,  they’ll contribute to the growth of the epidemic, and the number can be estimated from the shape of the epidemic curve.  That doesn’t work perfectly, but it works to some extent.  However, if people with few or no symptoms are less infectious, they’ll tend to be missed. People who have no symptoms and who don’t pass the virus on are invisible to the models, at least until there are enough people like that to get herd immunity working.  This post on Andrew Gelman’s blog looks at two fairly sophisticated modelling attempts, which don’t agree all that closely.

In the long run, it will be possible to get a reliable estimate of the number of people who have been infected, because they will end up with antibodies to the virus, and someone will develop a test for the antibodies and apply it to a suitable population sample.  That sort of data goes into the mortality rate estimates for flu: the mortality rate among people who develop classic, serious, flu symptoms is quite high, but there are a lot of people who are infected without ever knowing it — as much as 10% of the population — so the mortality rate among everyone infected is very low. In the same way, the retrospective mortality rate of COVID-19 will likely be lower (by some unknown factor) than the current ratio.

We do have reasonably good information on what happens to people who get sick enough to need medical attention, and  we know how that number grows with good or not so good control efforts. That’s the number that matters if you get sick. But we don’t know as much as we’d like about the structure of the epidemic and how many people will eventually get seriously ill, because we haven’t been able to find and count the subset of basically healthy cases.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »