May 7, 2020

Prediction is hard

From the Twitter account of the White House Council of Economic Advisors

From the Washington Post

Even more optimistic than that, however, is the “cubic model” prepared by Trump adviser and economist Kevin Hassett. People with knowledge of that model say it shows deaths dropping precipitously in May — and essentially going to zero by May 15.

The red curve is, as the Post says, much more optimistic. None of them look that much like the data — they are all missing the weekly pattern of death reporting — but the IHME/UW model now predicts continuing deaths out until August.  Even that is on the optimistic side: Trevor Bedford, a virologist at the University of Washington who has been heavily involved with the outbreak says he would expect a plateau lasting months rather than an immediate decline.  Now, disagreement in predictions is nothing new and in itself isn’t that noteworthy.  The problem is what the ‘cubic model’ means.

Prediction, as the Danish proverb says, is hard, because we don’t have any data from the future.  We can divide predictive models into three broad classes

  • Models based on understanding the actual process that’s causing the trends.  The SIR models and their extensions, which we’ve seen a lot of in NZ, are based on a simplified but useful representation of how epidemics work.  Weather forecasting works this way. So do predictions of populations for each NZ region into the future.
  • Models based on simplifying and matching previous inputs.  When Google can distinguish cat pictures from dog pictures, it’s because it has seen a bazillion of each and has worked out a summary of what cat pictures and dog pictures look like. It will compare your picture to  those two summaries and see what matches best.  Risk models for heart disease are like this: does your data look like the data of people who had heart attacks. Fraud risk models for banks, insurance companies, and the IRD work this way. It still helps a lot to understand about the process you’re modelling, so you know what sort of data to put in or leave out, and what sort of summaries to try to match.
  • Models based on extrapolating previous inputs.  In business and economics you often need predictions of the near future.  These can be constructed by summarising existing recent trends and the variation around them, then assuming the trends and variation will stay roughly the same in the short term.  Expertise in both statistics and in the process you’re predicting is useful, so that you know what sorts of trends there are and what information is available to model them. A key part of these time-series models is getting the uncertainty right, but even when you do a good job the predictions won’t work when the underlying trends change.

The SIR epidemiology models that you might have seen in the Herald are based on knowledge of how epidemics work.  The IHME/UW models are at least based on knowledge of what epidemics look like. The cubic model isn’t.

The cubic fit is a model of the second type, based just on simplifying and matching the available data.  It could be useful for smoothing the data — as the tweet says, “with irregular data, curve fitting can improve data visualization”.  In particular, the weekly up-and-down pattern comes from limitations in the death reporting process, so filtering it out will give more insight into current trends.

The particular model that produces the red curve is extremely simple (Lucy D’Agostino McGowan duplicated it).  If you write t for day of the year, so t starts at 1 for January 1, the model is

log(number of deaths + 0.5) = -0.0000393× t3 +0.00804× t2 -0.329×t – 0.865

What you can’t do with a smoothing/matching model like this is to extrapolate outside the data you have.  If you have a model trained to distinguish cat and dog pictures and you give it a picture of a turkey, it is likely to be certain that the picture is a cat, or certain that the picture is a dog, but wrong either way.  If you have a simple matching model where the predicted number of deaths depends only on the date, and the model matches data from dates in March and April, you can’t use it to predict deaths in June. The model has never heard of June. If it gets good predictions in June, that’s entirely an accident.

When you extrapolate the model forward in time, the right-hand side becomes very large and negative, so the predicted number of deaths is zero with extreme certainty.  If you were to extrapolate backwards in time, the predicted number of deaths would explode to millions and billions during December. There’s obviously no rational basis for using the model to extrapolate backwards into December, but there isn’t much more for using it to extrapolate forward — nothing in the model fitting process cares about the direction of time.

The Chairman of the Council of Economic Advisers until the middle of last year was Dr Kevin Hassett (there’s no Chairman at the moment). He’s now a White House advisor, and the Washington Post attributes the cubic model to him.  Hassett is famous for having written a book in 1999 predicting that the Dow Jones index would reach 36,000 in the next few years.  It didn’t — though he was a bit unlucky in having his book appear just before the dot-com crash.  Various unkind people on the internet have suggested a connection between these two predictive efforts.  That’s actually completely unfair.  Dow 36,000 was based on a model for how the stock and bond markets worked, in two parts: a theory that stocks were undervalued because of their relative riskiness, and a theory that the markets would realise this underpricing in the very near future.  The predictions were wrong because the theory was wrong, but that’s at least the right way to try to make predictions. Extrapolating a polynomial isn’t.

May 6, 2020

At risk

We often hear groups of people described as ‘high risk’ in the context of COVID.  The problem is that this means three different things, and they often aren’t distinguished clearly

  • People who have a relatively high exposure to coronavirus, so they are more likely to catch it: nurses, doctors, supermarket staff, police (high probability)
  • People who are more likely to get seriously sick if they do become infected: elderly, immunocompromised, people with chronic lung disease (high consequence)
  • People who are more likely to spread the infection if they get it. Some overlap with the first group, but also migrant workers, prisoners, and at least in the US, meat processing workers. (high transfer)

The second group are very different from the others. Suppose you were doing intensive testing to try to see if there was undetected community transmission of COVID.  You’d definitely want to test the first group, because that’s where you’re most likely to find the virus, and you might want to test the last group, because missing it there would be serious (as it was in Singapore).  You might well not go after the second group, because the safest thing for them is isolation — having a bunch of health workers barge in and stick swabs up their noses is unpleasant and possibly risky.  You’d absolutely want to test the second group if there was any indication of symptoms or exposure, but not just in the ordinary course of business.  The three groups are different.

Even Cory Doctorow confused the first and second groups a bit, in his rant about the risks of contact-notification apps

The proximity sensing they do is going to miss out on people who don’t have smartphones and/or don’t have the technological savvy to install them. That overlaps broadly with the most at-risk groups: elderly people and poor people.

Epidemiology is a team sport and the most vulnerable people are the MVPs on the team. “Our app will tell you if you came in contact with an infected person (but not if that person is from the most likely group of infected people)” is a fundamentally broken premise.

Elderly people are a ‘high consequence’ group — infection is serious for them. They aren’t a ‘high probability’ group — there’s no special reason why elderly people in the community would be more likely to get infected (or in residential facilities, if good care is taken)

May 5, 2020

NZ net excess mortality

Nice story by Farah Hancock at Newsroom, on NZ mortality data.

In places with less successful control of COViD-19, there has been a spike in deaths confirmed as due to coronavirus, and also a spike in other deaths.  In New Zealand, there hasn’t been — there isn’t any clear excess over the average for the time of year.  There undoubtedly have been deaths due to coronavirus, but there have also been deaths prevented by the lockdown (on the roads, for example), and there may well have been deaths caused by the lockdown (eg, people not getting heart attacks treated promptly), but the overall trends are spectacularly unlike those in New York City, which has not quite twice the population of New Zealand and over 20,000 net excess deaths.

There’s still plenty of time for NZ to catch up to outbreaks in other parts of the world. Let’s not do that.

May 3, 2020

What will COVID vaccine trials look like?

There are over 100 potential vaccines being developed, and several are already in preliminary testing in humans.  There are three steps to testing a vaccine: showing that it doesn’t have any common, nasty side effects; showing that it raises antibodies; showing that vaccinated people don’t get COVID-19.

The last step is the big one, especially if you want it fast. I knew that in principle, but I was prompted to run the numbers by hearing (from Hilda Bastian) of a Danish trial in 6000 people looking at whether wearing masks reduces infection risk.  With 3000 people in each group, and with the no-mask people having a 2% infection rate over two months, and with masks halving the infection rate, the trial would still have more than a 1 in 10 chance of missing the effect.  Reality is less favorable:  2% infections is more than 10 times the population percentage of confirmed cases so far in Denmark (more than 50 times the NZ rate), and halving the infection rate seems unreasonably optimistic.

That’s what we’re looking at for a vaccine. We don’t expect perfection, and if a vaccine truly reduces the infection rate by 50% it would be a serious mistake to discard it as useless. But if the control-group infection rate over a couple of months is a high-but-maybe-plausible 0.2%  that means 600,000 people in the trial — one of the largest clinical trials in history.

How can that be reduced?  If the trial was done somewhere with out-of-control disease transmission, the rate of infection in controls might be 5% and a moderately large trial would be sufficient. But doing a randomised trial in setting like that is hard — and ethically dubious if it’s a developing-world population that won’t be getting a successful vaccine any time soon.  If the trial took a couple of years, rather than a couple of months, the infection rate could be 3-4 times lower — but we can’t afford to wait a couple of years.

The other possibility is deliberate infection. If you deliberately exposed trial participants to the coronavirus, you could run a trial with only hundreds of participants, and no more COVID deaths, in total, than a larger trial. But signing people up for deliberate exposure to a potentially deadly infection when half of them are getting placebo is something you don’t want to do without very careful consideration and widespread consultation.  I’m fairly far out on the ‘individual consent over paternalism’ end of the bioethics spectrum, and even I’d be a bit worried that consenting to coronavirus infection could be a sign that you weren’t giving free, informed, consent.

 

 

May 2, 2020

Hype

This turned up via Twitter, with the headline Pitt researchers developing a nasal spray that could prevent covid-19

“The nice thing about Q-griffithsin is that it has a number of activities against other viruses and pathogens,” said Lisa Rohan, an associate professor in Pitt’s School of Pharmacy and one of the lead researchers in the collaboration, in a statement. “It’s been shown to be effective against Ebola, herpes and hepatitis, as well as a broad spectrum of coronaviruses, including SARS and MERS.”

The active ingredient is the synthetic form of protein extracted from a seaweed found in Australia and NZ. Guess how many human studies there have been of this treatment?

Clinicaltrials.gov reports one completed safety study of a vaginal gel, and one ongoing safety study of rectal administration, both aimed at HIV prevention. There appear to have been no studies against coronaviruses in humans, nor Ebola, herpes, or hepatitis. There appear to have been no studies of a nasal-spray version in humans (and I couldn’t even find any in animals, just studies of tissue samples in a lab). It’s not clear that a nasal spray would work even if it worked — eg, is preventing infection via the nose enough, or do you need to worry about the mouth.

Researchers should absolutely be trying all these things, but making claims of demonstrated effectiveness is not on.  We don’t want busy journalists having to ask Dr Bloomfield if we should stick seaweed up our noses.

Population density

With NZ’s good-so-far control of the coronavirus, there has been discussion on the internets as to whether New Zealand has high or low population density, and also on whether March/April is summer here or not.  The second question is easy. It’s not summer here. The first is harder.

New Zealand’s average population density is very low.  It’s lower than the USA. It’s lower than the UK. It’s lower than Italy even if you count the sheep.  On the other hand, a lot of New Zealand has no people in it, so the density in places that have people is higher.  Here are a couple of maps: “Nobody Lives Here” by Andrew Douglas-Clifford, showing the 78% of the country’s land area with no inhabitants, and a 3-d map of population density by Alasdair Rae (@undertheraedar)

We really should be looking at the population density of inhabited areas. That’s harder than it sounds, because it makes a big difference where you draw the boundaries. Take Dunedin. The local government area goes on for ever in all directions. You can get on an old-fashioned train at the Dunedin station, and travel 75km through countryside and scenic gorges to the historic town of Middlemarch, and you’ll still be in Dunedin. The average population density is 40 people per square kilometre.  If you look just within the StatsNZ urban boundary, the average population density is 410/square kilometre — ten times higher.

A better solution is population-weighted density, where you work out the population density where each person lives and average them. Population-weighted density tells you how far away is the average person’s nearest neighbour; how far you can go within bumping into someone else’s bubble.The boundary problem doesn’t matter as much: hardly anyone lives in 90% of Dunedin, so it gets little weight — including or excluding the non-urban area doesn’t affect the average.  What does matter is the resolution.

If you work out population weighted densities using 1km squares you will get a larger number than if you use 2km squares, because the 1km square ignores density variation within 1km, and the 2km squares ignore density variation within 2km. If you use census meshblocks you will get a larger number than if you use electorates, and so on.  That can be an issue for international comparisons

However, this is a graph of population-weighted density with 1km squares across a wide range of European and Australian cities using 1km square grids, in units of people per hectare:

If you compare the Australian and NZ cities using meshblocks, Auckland slots in just behind Melbourne, with Wellington following, and Christchurch is a little lower than Perth. The New York Metropolitan Area is at about 120.  Greater LA, the San Francisco Bay Area, and Honolulu, are in the mid-40s, based on Census Bureau data. New York City is at about 200.  I couldn’t find data for any Asian cities, but here’s a map of Singapore showing that a lot of people live in areas with population density well over 2000 people per square kilometre, or 200/hectare.

So, yes, New Zealand is more urban than foreigners might think, and Auckland is denser than many US metropolitan areas. But by world standards even Auckland and Wellington are pretty spacious.

May 1, 2020

The right word

Scientists often speak their own language. They sometimes use strange words, and they sometimes use normal words but mean something different by them.  Toby Morris & Siouxsie Wiles have an animation of some examples.

The goal of scientific language is usually to be precise, to make distinctions that aren’t important in everyday speech. Scientists aren’t trying to confuse you or keep you out, though those effects can happen  — and they aren’t always unwelcome.  I’ve written on my blog about two examples: bacteria vs virus (where the scientists are right) and organic (where they need to get over themselves).

This week’s example of conflict between trying to be approachable and trying to be precise is the phrase “false positive rate”.  When someone gets a COVID test, whether looking for the virus itself or looking for antibodies they’ve made in reaction to it, the test could be positive or negative.  We can also divide people up by whether they really have/had COVID infection or no infection. This gives four possibilities

  • True positives:  positive test, have/had COVID
  • True negatives: negative test, really no COVID
  • False positives: positive test, really no COVID
  • False negatives: negative test, have/had COVID

If you encounter something called the “false positive rate”, what is it? It obvious involves the false positives, divided by something, but it could be false positives as a proportion of all positive tests, or false positives as a proportion of people who don’t have COVID, or even false positives as a proportion of all tests.  It turns out that the first two of these definitions are both in common use.

Scientists (statisticians and epidemiologists) would define two pairs of accuracy summaries

  • Sensitivity:  true positives divided by people with COVID
  • Specificity: true negatives divided by people without COVID
  • Positive Predictive Value(PPV): true positives divided by all positives
  • Negative Predictive Value(NPV): true negatives divided by all negatives

The first ‘false positive rate’ definition is 1-PPV NPV; the second is 1-specificity.

If you write about the antibody studies carried out in the US, you can either use the precise terms, which will put off people who don’t know anything, or use the vague terms, and people who know a bit about the topic may misunderstand and think you’ve got them wrong.

April 27, 2020

Some kind of record

This isn’t precisely statistics, but Bloomberg Opinion may have set some kind of record for the density of easily checkable false statements in a single column that’s still making a basically sound argument.  Joe Nocera was arguing in favour of strict lockdown policy like the one in New Zealand.

Just looking at the tweets we have

  1. New Zealanders aren’t allowed to drive except in emergencies and can only be out of the house for an hour a day, to get exercise or to buy essentials. This one-hour limit is enforced by the police. The one hour limit isn’t true, and so isn’t being enforced by police. The ‘only in emergencies’ is at best misleading: you can drive to buy essentials.
  2. At the pharmacy, only one person is allowed in at a time, and clerks retrieve the goods so customers never touch anything until they return home. The wait to get in a grocery store is around an hour. If you don’t have a mask and gloves, you won’t get in. True as to pharmacies.  False as to the wait to get in a grocery store — which also contradicts the alleged one-hour limit.  False as to masks and gloves: the use of masks is encouraged by some groups but they are not required and users are only barely a majority, if that. Gloves aren’t even recommended.
  3. In New Zealand…
    Every restaurant is closed.
    There’s no take-out.
    There are no deliveries.
    E-commerce has been halted. Food-processing companies still operate, but virtually every other form of blue-collar work is shut down. True on restaurant closures (until tomorrow).  It’s ambiguous whether ‘deliveries’ goes with the previous bullet or the following one, but e-commerce is certainly not halted. You can’t get prepared food delivered, but groceries, wine and spirits, some office equipment, and a range of other things are being delivered.  I have bought an office chair, because I’m working from home until at least the end of July.  The blue-collar work thing is true-ish — he may underestimate how much of NZ’s economy is in food production for export, but construction and forestry are shut down.  
  4. Citizens are surviving financially with emergency checks from the government. Essential workers in New Zealand are truly essential. Although there are Covid-19 clusters — a church; a rest home; a wedding party — workplaces have largely been virus-free.  The Marist College cluster and the World Hereford Conference definitely count as ‘workplace’.  The government is primarily sending payments to businesses so they can keep employees or provide them sick leave, and to self-employed people, though there has obviously been an increase in the number of people getting unemployment benefits and applying for hardship grants. And, of course, we don’t use checks for all this.

It’s probably best to think of the column as ‘about New Zealand’ in the same way Gilbert and Sullivan’s Mikado is ‘about Japan’ — it’s really just a background setting to criticise the writer’s own country.

April 25, 2020

Why New York is different

There have been three fairly large American seroprevalence studies recently.  These are studies that sample people from the community and test their blood for antibodies to the COViD-19 virus. Most people who have been infected, even if they recovered weeks ago, will have antibodies to the virus.  And people who have not been infected will not have antibodies to the COVID-19 virus, though the test isn’t perfect and will occasionally be confused by antibodies to something else.

The two studies in Los Angeles and in Santa Clara County (Silicon Valley) estimated that a few percent of people had been exposed to the virus. The New York study estimated far higher rates — in New York City itself, 21%.  The Californian studies have been widely criticised because of questions about the representativeness of the sample and the accuracy of the statistical calculations.  The New York study has had much less pushback.

One difference between the New York study and the others is that it’s not being pushed as a revolutionary change in our understanding of coronavirus, so people aren’t putting as much effort into looking at the details. Much more important, though, is that it is far easier to get a prevalence of 20% roughly right than to get a prevalence of 2-3% roughly right. If you make extraordinary claims based on a prevalence estimate of 2-3%, you need data and analysis of extraordinary quality (in a good way).  If your prevalence estimate of 20% is consistent with the other empirical data and models for coronavirus, it doesn’t need to stand on its own to the same extent.

Getting a good estimate of a prevalence of 2-3% is hard because the number of people who really have been exposed is going to be about the same as the number where the test gets confused and gives the wrong answer.  If you aren’t precisely certain of the accuracy of the test (and you’re not), the uncertainty in the true prevalence can easily be so large as to make the effort meaningless. On top of that, the quality of the sampling is critical:  even a little bit of over-volunteering by people who have been sick and want reassurance can drive up your estimate to be larger than the truth.  You can easily end up with an estimate saying the prevalence is much higher than most people expect, but only very weak evidence for that claim.

It looks as though the antibody test used in New York was less accurate than the one used in Santa Clara; the New York State lab that ran the testing says only that they are confident the rate of positive tests in truly unexposed people is less than 7%; their best-guess estimate will presumably be around 2-3%, in contrast with the best-guess estimate of 0.5% for test used in Santa Clara. But even, worst-case,  if 7% of tests were false positives, that still leaves 14% that were genuine. And since the test will miss some people who were truly exposed, the true prevalence will be higher than 14%. Suppose, for example, that the test picks up antibodies in 90% of people who really have been exposed. The 14% we’re seeing is only 90%80% of the truth, so the truth would be about 16%, and with a less-sensitive test, the truth would have to be higher.  So, even though the test is imperfect, somewhere between one in five and one in seven people tested had been exposed to the virus.  That’s a narrow enough range to be useful.  You still have to worry about sampling: it’s not clear whether sampling people trying to shop will give you an overestimate or an underestimate relative to the whole population, but the bias would have to be quite large to change the basic conclusions of the study.

The estimate for New York fits reasonably well with the estimate of roughly 0.1% for the proportion of the New York City population that have died because of the pandemic, and the orthodox estimate of around 1% for the proportion of infected people who will die of COViD.  These all have substantial uncertainty: I’ve talked about the prevalence estimate already. The infection fatality rate estimate is based on a mixture of data sets, all unrepresentative in different ways. And the excess mortality figure itself is fairly concrete, but it includes, eg, people who died because they didn’t try to get to hospital for heart attacks, and in the other direction, road deaths that didn’t happen.  It is still important that these three estimates fit together, and it correctly gives researchers more confidence in all the numbers.  The Californian studies imply only about 0.1% of infections are fatal, and that doesn’t fit the excess mortality data or the standard fatality rate estimate at all.

There’s an analogy that science is like a crossword1. But it’s not a cryptic crossword, where correct answers are obvious once you see the trick. It’s the other sort of crossword, where potential answers to different clues support or contradict each other.  If the clue is “Controversial NZ party leader (7)” and using “Bridges” would force another clue to be a seven letter word ending in “b”, you might pencil in “Seymour” instead and await further evidence.

1: Independently invented by multiple people including Susan Haack and Chad Orzel.

April 19, 2020

Counting rare things is hard

As promised, a second ‘prevalence’ post, this time on test accuracy.

In any medical diagnostic or screening setting, what we know is the number of positive and negative tests. What we want to know is the number of people with and without the condition.  It’s easy to slip into talking about these as if they’re the same, but they aren’t.

For the coronavirus, we have two basic sorts of test.  There  are ‘PCR’ tests, which is what everyone had been using.  And there are ‘antibody’ tests, which are new.

The PCR tests measure the presence of the virus. They transcribe the genetic material of the virus from RNA to DNA, and then use DNA copying enzymes to amplify it billions of times; the ‘polymerase chain reaction’.  After amplification, there’s enough of the genetic sequence that fluorescent dyes attached to it or to the input materials can produce measurable light.

The copying looks for a unique, fairly short, genetic sequence that’s present in the new coronavirus, but not in the SARS or MERS viruses, or the four coronaviruses that cause common colds (in fact, usually more than one genetic sequence, plus a ‘positive control’ that makes sure the process is working, plus a ‘negative control’ that doesn’t have any RNA). Because of the fidelity of DNA replication, the technical ‘assay error’ of the PCR test is so close to zero as makes no difference: a few copies of the virus are enough for a positive result, and it’s almost impossible to get a positive result without any virus.

Unfortunately, the real-world diagnostic error isn’t quite that good.  The false positive rate is still basically zero, given good lab practice; you can’t get a positive test without viral RNA from some source.  The false negative rate can be appreciable, because the test doesn’t ask if there’s virus somewhere in your system; it asks if there’s virus on the swab.   In early COViD-19 disease, the best way to grab some virus is to stick a swab almost far enough up your nose to do brain surgery, and twist it around a bit.   More often than not, this will pick up some virus. But if you get tested too early, there might not be enough virus, and if you get tested too late the infection might have relocated to your chest.

So how good is the PCR test in practice? Well, we don’t know for sure.  It’s the best test we have, so there isn’t a ‘true answer’ to compare it to.  However, a study that looked at tests using multiple ways of extracting a sample, suggest the sensitivity of the test is 65%: if you have early-stage COViD-19, you’ve got about a two in three chance of testing positive.  There’s a lot of uncertainty around the exact value; fortunately the exact value doesn’t matter all that much.

Antibody tests are new for coronavirus, but are familiar in other settings.  Older HIV tests looked for antibodies to the virus, as do the initial tests for Hepatitis C (which are followed up by PCR).  These antibody tests rely on the highly selective binding of antibodies to the antigens they detect. Because antibody tests detect your body’s reaction to the virus, a positive reaction takes time — at least a few days, maybe a week — and it stays around at least a short time after you recover.  Antibody tests are amazingly accurate, but not quite as amazingly accurate as PCR. Everyone has exactly the same identifying genetic tags in their virus, but everyone makes slightly different antibodies to the virus.  An antibody test is trying to pick up everyone’s diverse antibodies to the new coronavirus, but not pick up anyone’s antibodies to the nearly infinite diversity of other antigens in the world, including other coronaviruses. At any point in time, there’s a tradeoff: a test that picks up coronavirus antibodies more sensitively will also pick up more other things, and one that avoids reacting to other things will miss more coronavirus infections.

As I said above, the exact value of the false negative positive rate doesn’t matter that much when you’re estimating population prevalence.  The false positive negative rate matters a lot.  Suppose you have an antibody test with a false positive rate of 5%. For every 100 truly-negative people you test, there will be an average of 5 positive tests; for every 1000 people, 50 positive tests.  In New Zealand, we’re sure the population prevalence is less than 1%, and I would expect it to be less than 0.1%.  If you gave this test to 1000 people, there would be an average 50 positive results and maybe one or two true positives. It is very much an average, so if you got 53 positive tests you would have no idea whether that was five true positives or three or none at all.  Even if the false positive rate were as low as 0.5%, you’d expect more false positives than true positives in New Zealand. And it’s worse than that: the error rates aren’t known accurately yet, so even if the manufacturer’s estimate was 0.5% false positives, it could easily be 1% or maybe even 2%.

There’s a new study out of Stanford (preprint) that tested 3330 people and found 50 positives. A helpful StatsChat reader posted a link to a review of this study. What I’m writing here agrees pretty closely with that review.

A rate of 50 positives out of 3330 healthy people is high: if true, it would imply COViD-19 was much more common and therefore much less serious than we thought. The researchers used a test that had given 2 positive results out of 401 samples known to be negative (because they were taken before the pandemic started).  If the false positive rate was exactly 2/401 , you’d get 0.005×3330 false positives on average, or only about 17, leaving 33 true positives.  But 2/401 is an estimate, with uncertainty.  If we assume the known samples were otherwise perfectly representative, what we can be confident of with 2 positives out of 401 is only that the false positive rate is no greater than 1.5%. But 1.5% of 3330 is 50, so a false positive rate of 1.5% is already enough to explain the results! We don’t even have to worry if, say, the researchers chose this test from a range of competitors because it had the best supporting evidence and thereby introduced a bit of bias.

On top of that, the 3330 people were tested because they responded to Facebook ads.  Because infection is rare, you don’t need to assume much self-selection of respondents to bias the prevalence estimate upwards.  You might be surprised to see me say this, because yesterday I thought voluntary supermarket surveys were a pretty good idea. They are, but they will still have bias, which could be upwards or downwards. We wouldn’t use the results of a test in a few supermarkets to overturn the other evidence about disease severity; we want to use them to start finding undetected cases — any undetected cases.

Counting rare things is hard, and false positives are overwhelmingly more important than false negatives, which is currently a problem for antibody tests.  PCR tests based on a swab are unpleasant for the person being tested and risky for the person holding the swab, but they are the best we have now. There might be other ways to use antibody tests, for example if true infections cluster more strongly within household than false positives, or if two tests with different characteristics can be combined, or if more accurate ones become available. But it’s not easy.