May 1, 2020

The right word

Scientists often speak their own language. They sometimes use strange words, and they sometimes use normal words but mean something different by them.  Toby Morris & Siouxsie Wiles have an animation of some examples.

The goal of scientific language is usually to be precise, to make distinctions that aren’t important in everyday speech. Scientists aren’t trying to confuse you or keep you out, though those effects can happen  — and they aren’t always unwelcome.  I’ve written on my blog about two examples: bacteria vs virus (where the scientists are right) and organic (where they need to get over themselves).

This week’s example of conflict between trying to be approachable and trying to be precise is the phrase “false positive rate”.  When someone gets a COVID test, whether looking for the virus itself or looking for antibodies they’ve made in reaction to it, the test could be positive or negative.  We can also divide people up by whether they really have/had COVID infection or no infection. This gives four possibilities

  • True positives:  positive test, have/had COVID
  • True negatives: negative test, really no COVID
  • False positives: positive test, really no COVID
  • False negatives: negative test, have/had COVID

If you encounter something called the “false positive rate”, what is it? It obvious involves the false positives, divided by something, but it could be false positives as a proportion of all positive tests, or false positives as a proportion of people who don’t have COVID, or even false positives as a proportion of all tests.  It turns out that the first two of these definitions are both in common use.

Scientists (statisticians and epidemiologists) would define two pairs of accuracy summaries

  • Sensitivity:  true positives divided by people with COVID
  • Specificity: true negatives divided by people without COVID
  • Positive Predictive Value(PPV): true positives divided by all positives
  • Negative Predictive Value(NPV): true negatives divided by all negatives

The first ‘false positive rate’ definition is 1-PPV NPV; the second is 1-specificity.

If you write about the antibody studies carried out in the US, you can either use the precise terms, which will put off people who don’t know anything, or use the vague terms, and people who know a bit about the topic may misunderstand and think you’ve got them wrong.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Andrew Matheson

    An interesting post, and very helpful in understanding some of the current discussion.

    I’m not a statistician of any sort, but this statement seems odd: “The first ‘false positive rate’ definition is 1-NPV”. Shouldn’t it be 1-PPV?

    Thanks

    5 years ago

    • avatar
      Thomas Lumley

      No, it’s false positives/(false positives+ true negatives)

      5 years ago

      • avatar
        Rob Sagetti

        Would FP / (FP + TN) not be 1 – specificity, which is the second definition?
        The first definition, FP as a proportion of all positives, or FP / (TP + FP), would be 1 – PPV, no?

        5 years ago

      • avatar
        Tommy Jones

        I think Andrew is right unless I’m misunderstanding something fundamental. (Always probable in my case.)

        1 – NPV is the false omission rate, a type of false negative. 1 – PPV is the false discovery rate, a type of false positive.

        For reference: https://en.wikipedia.org/wiki/Sensitivity_and_specificity

        There’s a long table down and to the right that has the definitions of just about anything one would care to calculate from a confusion matrix.

        5 years ago