December 9, 2020

Election hypothesis testing

As you may have heard, there are people who are unhappy that Joe Biden is president-elect and think the courts should do something. In today’s most statistically interesting lawsuit, Texas is suing Georgia, Michigan, Pennsylvania, and Wisconsin, asking the Supreme Court to overturn their election results.  Legal Twitter does not appear convinced (on legal grounds).

There’s also a Declaration from an Expert arguing that the results are statistically impossible without fraud. This is statistics, so we can look at some of it here. It’s straightforward hypothesis testing, of the type we teach in high school.

Starting on paragraph 10, he’s looking at votes in Georgia and doing hypothesis tests on binary data. The tests being done are

  1. Comparing the total number of votes for Joe Biden with the total number of votes for Hillary Clinton in 2016
  2. Comparing the proportion of votes for Joe Biden (as a fraction of the 2020 vote) with the proportion of votes for Hillary Clinton (as a fraction of the 2016 vote)
  3. Comparing the proportion of votes for Biden (vs Trump) in ballots counted before and after 3:10am on election night

In all three cases, he finds very strong evidence that the two groups being compared are more different than if they were sampled independently from the same probability distribution.  The idea is that while massive undetected fraud is unlikely, if the observed data are even more unlikely we need to consider fraud as an explanation.  Clearly, this only makes sense if the mathematical null hypothesis being tested really would be unlikely in the absence of fraud.

Straw-man null hypotheses can be a problem in science: people will set up a null hypothesis that there’s no difference (or no important difference) between  two groups, even when no reasonable person would have entertained the possibility that the groups are the same, and the real question is how much they differ.   This election analysis has the same problem.

In test 1, we know that 2016 was four years ago, so the population has grown. We also know turnout was higher all over the US, including in states/counties/precincts won by Trump. For example, in Texas (where Texas is not seeking to overturn the results), 8.56 million people voted for Trump or Clinton in 2016 and 11.15 million voted for Trump or Biden in 2020.  The null hypothesis never had any reasonable chance of being true; finding that it actually is false is not surprising and provides no motivation for considering more esoteric explanations.

In test 2, the overall turnout and population change are taken into account.  A difference between Biden and Clinton’s percentage would be hard to explain unless Biden were actually more popular than Clinton with Georgia voters.  There are at least two reasons this would not be astonishing. Biden is more popular generally, and he’s specifically more popular with Black voters, who are making up an increasing fraction of the Georgia population. So, finding that Biden was more popular than Clinton with Georgia voters is not surprising, and provides no motivation for considering more esoteric explanations.

In test 3, the comparison is between votes counted earlier and votes counted later than 3:10am.  The statistical test provides strong evidence that votes counted early had different preferences from those counted later.  This would be surprising if you’d expect the two sets of votes to be identical — eg, if you mixed all the ballots together and counted them in random order. It turns out that this is not what happened.  The early votes were primarily those cast on election day; the later votes primarily those cast in advance.  The statistical test provides strong evidence that people voting in person on election day were different from those voting in advance. Again, this is not remotely surprising given the different perspectives on the pandemic offered by the two campaigns.

There are actually some technical problems with the statistical testing, but these pale in comparison to the problem of not testing hypotheses that have any real bearing on the fraud question.  It’s hardly worth mentioning the technical problems, except that this is a statistics blog.   The analysis treats the  votes in each comparison as independent observations. In fact, the comparison in test 3 will be subject to clumping: groups of people will affect each others voting preferences, and the percentages will have more variability than if they were from five million independent coin tosses. The evidence (against the straw-man null hypothesis) will be weaker than you’d compute from a model of independent coin tosses.

In tests 1 and 2 there will be this clumping, but in the other direction there’s the problem that the 2016 and 2020 votes are mostly from the same people.  If you asked people their vote today and tomorrow you’d expect the same answer from most people. If you asked in 2016 and 2020 the concordance would be weaker, but you’d expect it to still be there.  So, the statistical test would not actually be valid even for the straw-man null hypotheses, but it’s hard to say precisely how misleading it would be.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Steve Curtis

    The Texas case is interesting as they seem to be using the procedure where the Supreme Court acts as a beginning trial court for disputes between states , and thus they leap frog all the lower federal courts- where previous election hearings have failed.
    Im sure this case will fail too, for obvious reasons. The Supreme Court declined today to hear a further appeal about existing federal courts case in Pennsylvania, the one where Guiliani had a bad hair day.

    4 years ago