December 17, 2020

Currie Cup Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Bulls 7.63 6.16 1.50
Sharks 6.78 5.63 1.10
Western Province 4.31 5.26 -1.00
Lions 2.44 1.46 1.00
Cheetahs -4.90 -2.96 -1.90
Pumas -7.64 -6.66 -1.00
Griquas -8.62 -8.90 0.30

 

Performance So Far

So far there have been 9 matches played, 8 of which were correctly predicted, a success rate of 88.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Western Province vs. Pumas Dec 11 28 – 14 17.00 TRUE
2 Cheetahs vs. Lions Dec 12 23 – 39 -0.70 TRUE
3 Sharks vs. Bulls Dec 12 32 – 29 3.80 TRUE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Pumas vs. Cheetahs Dec 18 Pumas 1.80
2 Griquas vs. Bulls Dec 19 Bulls -11.80
3 Lions vs. Sharks Dec 19 Lions 0.20

 

December 16, 2020

Covid vaccine #2

We now have fairly detailed data on the second Covid vaccine, the one from Moderna.

Again, I’ll add more as I see them.

Causes and implications

From the Herald

A coroner has warned of the dangers of driving while impaired by drugs after reviewing nine fatal vehicle crashes and finding cannabis use was implicated in six of them.

but further down in the story

He was found to be almost five times the legal drink-drive limit while also testing positive for cannabis and its constituent element tetrahydrocannabinol (THC).

At a blood alcohol concentration of 0.25%, your risk of a crash has increased by more than a factor of 100.  That is, more than 99 out of 100 crashes in people with blood alcohol of 0.25% will be caused by the alcohol. You don’t need cannabis use to explain a crash like this, and it’s not clear that there was any relevant cannabis use.  The report says the driver was at a BBQ where people were drinking. There is no suggestion in the story that anyone was smoking weed (though it isn’t specifically ruled out).

The description of testing positive for “cannabis and its constituent element tetrahydrocannabinol” is also a bit weird.  I think they just tested for THC; that’s what I’ve seen in previous reports from ESR, who do the testing.  And they can pick up very small amounts of THC; a 2012 publication on tests after fatal crashes reported a range from approximately 0.1 ng/mL to 44 ng/mL (mean 5.6 ng/mL) concentration in blood samples.  Some of these people will have been impaired by THC, but by no means all of them.  ESR are careful about how they report this sort of thing, and don’t write things like “cannabis use was implicated” when they mean “THC was detectable”, but the coroner seems to be less careful.

The coroner didn’t say anything about alcohol use in the other eight fatal vehicle crashes.  You’d hope, if he’s making those sorts of statements about drugs that some of the crashes didn’t involve large amounts of alcohol and had evidence of recent consumption of cannabis or some other reason to think it was really implicated, but you’d also hope he’d make that clear if it was true.

 

December 15, 2020

New, improved Covid?

From Radio NZ

A new variant of coronavirus has been found which is growing faster in some parts of England, MPs have been told.

From the Herald (from the Telegraph)

A new variant of coronavirus has been identified in England and is spreading rapidly.

There is a sense in which this is true. The virus is spreading rapidly in southern England. And, because new mutations arise all the time and get passed on as the virus is spread, there is a new mutation that is more common in these new cases. However, there’s currently no evidence that there’s anything about this mutation that affects the spread of the virus at all.

Over the year,  thousands of genetic variants have been seen in the coronavirus (a paper a few weeks ago looks at 12000).  Some of these have become common, and so  might possibly  be better at spreading  Some mutations have arisen more than once and so might possibly be better at spreading. Mostly, though, variants have become common because they’ve found themselves in favorable circumstances — in people who don’t wear masks, or people who live in crowded situations, or people who go to church and sing, or who attend motorcycle rallies, or whatever. These variants are common because they won the lottery, not because they worked smarter and harder or had a #8-wire can-do attitude.  And, to be fair, the stories do later go on to admit this, or at least raise it as a contrasting view.

So far, there is one variant, called D614G, with reasonable (though not overwhelming) evidence that it makes a bit of difference to coronavirus transmission.  For example, a recent paper on it says

Although evidence is still accumulating, the increasing predominance of D614G in humans raises the possibility that viruses with this mutation have a fitness advantage, perhaps allowing more efficient person-to-person transmission. Our virological data are consistent with, but do not themselves demonstrate, this hypothesis. Interestingly, this mutation does not appear to significantly impact disease severity

In contrast, the Herald, (from news.com.au) back in July wrote about the D614G variant

The worst fears of epidemiologists have been realised: Covid-19 has mutated, and the strain now dominating the world is up to six times more infectious.

Fortunately, it wasn’t anything like six times more infectious.  This new one probably won’t amount to much either, though the people who look at Covid genomics will keep track of it like they do with all the other variants.

 

Update: useful Twitter thread for people who want nerdy details.

December 10, 2020

Rugby Predictions for the Weekend of December 19-20

I will be out of internet contact, even without a computer until Wednesday December 16 so I won’t be able to follow my usual practice of posting predictions on a Tuesday.  This only affects predictions for the Currie Cup as it happens.

I expect to post predictions for Round 4 of the Currie Cup either late on December 16 or on the morning of December 17, New Zealand time.

December 9, 2020

Election hypothesis testing

As you may have heard, there are people who are unhappy that Joe Biden is president-elect and think the courts should do something. In today’s most statistically interesting lawsuit, Texas is suing Georgia, Michigan, Pennsylvania, and Wisconsin, asking the Supreme Court to overturn their election results.  Legal Twitter does not appear convinced (on legal grounds).

There’s also a Declaration from an Expert arguing that the results are statistically impossible without fraud. This is statistics, so we can look at some of it here. It’s straightforward hypothesis testing, of the type we teach in high school.

Starting on paragraph 10, he’s looking at votes in Georgia and doing hypothesis tests on binary data. The tests being done are

  1. Comparing the total number of votes for Joe Biden with the total number of votes for Hillary Clinton in 2016
  2. Comparing the proportion of votes for Joe Biden (as a fraction of the 2020 vote) with the proportion of votes for Hillary Clinton (as a fraction of the 2016 vote)
  3. Comparing the proportion of votes for Biden (vs Trump) in ballots counted before and after 3:10am on election night

In all three cases, he finds very strong evidence that the two groups being compared are more different than if they were sampled independently from the same probability distribution.  The idea is that while massive undetected fraud is unlikely, if the observed data are even more unlikely we need to consider fraud as an explanation.  Clearly, this only makes sense if the mathematical null hypothesis being tested really would be unlikely in the absence of fraud.

Straw-man null hypotheses can be a problem in science: people will set up a null hypothesis that there’s no difference (or no important difference) between  two groups, even when no reasonable person would have entertained the possibility that the groups are the same, and the real question is how much they differ.   This election analysis has the same problem.

In test 1, we know that 2016 was four years ago, so the population has grown. We also know turnout was higher all over the US, including in states/counties/precincts won by Trump. For example, in Texas (where Texas is not seeking to overturn the results), 8.56 million people voted for Trump or Clinton in 2016 and 11.15 million voted for Trump or Biden in 2020.  The null hypothesis never had any reasonable chance of being true; finding that it actually is false is not surprising and provides no motivation for considering more esoteric explanations.

In test 2, the overall turnout and population change are taken into account.  A difference between Biden and Clinton’s percentage would be hard to explain unless Biden were actually more popular than Clinton with Georgia voters.  There are at least two reasons this would not be astonishing. Biden is more popular generally, and he’s specifically more popular with Black voters, who are making up an increasing fraction of the Georgia population. So, finding that Biden was more popular than Clinton with Georgia voters is not surprising, and provides no motivation for considering more esoteric explanations.

In test 3, the comparison is between votes counted earlier and votes counted later than 3:10am.  The statistical test provides strong evidence that votes counted early had different preferences from those counted later.  This would be surprising if you’d expect the two sets of votes to be identical — eg, if you mixed all the ballots together and counted them in random order. It turns out that this is not what happened.  The early votes were primarily those cast on election day; the later votes primarily those cast in advance.  The statistical test provides strong evidence that people voting in person on election day were different from those voting in advance. Again, this is not remotely surprising given the different perspectives on the pandemic offered by the two campaigns.

There are actually some technical problems with the statistical testing, but these pale in comparison to the problem of not testing hypotheses that have any real bearing on the fraud question.  It’s hardly worth mentioning the technical problems, except that this is a statistics blog.   The analysis treats the  votes in each comparison as independent observations. In fact, the comparison in test 3 will be subject to clumping: groups of people will affect each others voting preferences, and the percentages will have more variability than if they were from five million independent coin tosses. The evidence (against the straw-man null hypothesis) will be weaker than you’d compute from a model of independent coin tosses.

In tests 1 and 2 there will be this clumping, but in the other direction there’s the problem that the 2016 and 2020 votes are mostly from the same people.  If you asked people their vote today and tomorrow you’d expect the same answer from most people. If you asked in 2016 and 2020 the concordance would be weaker, but you’d expect it to still be there.  So, the statistical test would not actually be valid even for the straw-man null hypotheses, but it’s hard to say precisely how misleading it would be.

Vaccine data

The FDA has released its briefing document and Pfizer’s briefing document for their external advisory committee meeting on Friday.  Lots and lots of lovely detail.

Useful summaries and interpretations (I’ll add more as I come across them):

You can watch the FDA advisory committee meeting, from 4am to 1pm Friday morning NZ time, and I assume there will be a recording available afterwards. It will be very boring, but transparency is like that.

 

PS: there is also a new publication from the Oxford group about some of the Oxford/AstraZeneca trials. It’s not really going to make anyone happy.

PPS: Next week, the FDA does the Moderna vaccine, but that’s less interesting for NZ since we didn’t buy any.

December 8, 2020

Top 14 Predictions for Postponed Games

Team Ratings for Postponed Games

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Racing-Metro 92 6.73 6.21 0.50
Stade Toulousain 5.63 4.80 0.80
Lyon Rugby 5.31 5.61 -0.30
La Rochelle 4.72 2.32 2.40
Clermont Auvergne 4.54 3.22 1.30
RC Toulonnais 3.39 3.56 -0.20
Bordeaux-Begles 2.77 2.83 -0.10
Montpellier 2.26 2.30 -0.00
Stade Francais Paris -0.16 -3.22 3.10
Castres Olympique -2.21 -0.47 -1.70
Section Paloise -3.37 -4.48 1.10
Brive -3.98 -3.26 -0.70
Aviron Bayonnais -4.85 -4.13 -0.70
SU Agen -10.22 -4.72 -5.50

 

Performance So Far

So far there have been 70 matches played, 48 of which were correctly predicted, a success rate of 68.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Aviron Bayonnais vs. Stade Toulousain Dec 05 20 – 24 -5.10 TRUE
2 Bordeaux-Begles vs. Racing-Metro 92 Dec 05 12 – 17 2.60 FALSE
3 Clermont Auvergne vs. Montpellier Dec 05 15 – 21 8.80 FALSE
4 Lyon Rugby vs. La Rochelle Dec 05 22 – 18 6.40 TRUE
5 Section Paloise vs. Castres Olympique Dec 05 13 – 17 5.10 FALSE
6 Stade Francais Paris vs. RC Toulonnais Dec 05 24 – 23 2.10 TRUE
7 SU Agen vs. Brive Dec 05 6 – 15 -0.00 TRUE

 

Predictions for Postponed Games

Here are the predictions for Postponed Games. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Castres Olympique vs. Brive Dec 23 Castres Olympique 7.30

 

Rugby Premiership Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Exeter Chiefs 10.47 7.35 3.10
Sale Sharks 4.22 4.96 -0.70
Wasps 3.01 5.66 -2.70
Bristol 1.43 1.28 0.10
Bath 0.50 2.14 -1.60
Harlequins -0.37 -1.08 0.70
Gloucester -1.99 -1.02 -1.00
Northampton Saints -2.95 -2.48 -0.50
Leicester Tigers -6.21 -6.14 -0.10
Newcastle Falcons -6.89 -10.00 3.10
Worcester Warriors -7.15 -5.71 -1.40
London Irish -7.17 -8.05 0.90

 

Performance So Far

So far there have been 18 matches played, 11 of which were correctly predicted, a success rate of 61.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Bristol vs. Northampton Saints Dec 05 18 – 17 9.90 TRUE
2 Leicester Tigers vs. Exeter Chiefs Dec 06 13 – 35 -10.90 TRUE
3 Wasps vs. Newcastle Falcons Dec 06 17 – 27 17.00 FALSE
4 Worcester Warriors vs. Bath Dec 06 17 – 33 -1.60 TRUE
5 London Irish vs. Sale Sharks Dec 07 13 – 21 -6.70 TRUE
6 Gloucester vs. Harlequins Dec 07 24 – 34 4.40 FALSE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Bath vs. London Irish Dec 27 Bath 12.20
2 Harlequins vs. Bristol Dec 27 Harlequins 2.70
3 Newcastle Falcons vs. Leicester Tigers Dec 27 Newcastle Falcons 3.80
4 Exeter Chiefs vs. Gloucester Dec 27 Exeter Chiefs 17.00
5 Northampton Saints vs. Worcester Warriors Dec 27 Northampton Saints 8.70
6 Sale Sharks vs. Wasps Dec 28 Sale Sharks 5.70

 

Pro14 Predictions for Round 9

Team Ratings for Round 9

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 19.39 16.52 2.90
Munster 10.26 9.90 0.40
Ulster 7.88 4.58 3.30
Edinburgh 3.72 5.49 -1.80
Glasgow Warriors 3.12 5.66 -2.50
Scarlets 1.88 1.98 -0.10
Connacht 1.04 0.70 0.30
Cardiff Blues -0.41 0.08 -0.50
Cheetahs -0.46 -0.46 0.00
Ospreys -3.25 -2.82 -0.40
Treviso -4.26 -3.50 -0.80
Dragons -7.28 -7.85 0.60
Southern Kings -14.92 -14.92 0.00
Zebre -16.71 -15.37 -1.30

 

Performance So Far

So far there have been 43 matches played, 29 of which were correctly predicted, a success rate of 67.4%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Connacht vs. Treviso Dec 05 31 – 24 11.50 TRUE
2 Glasgow Warriors vs. Dragons Dec 06 22 – 23 19.10 FALSE

 

Predictions for Round 9

Here are the predictions for Round 9. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Connacht vs. Ulster Dec 27 Ulster -1.80
2 Dragons vs. Cardiff Blues Dec 27 Cardiff Blues -1.90
3 Glasgow Warriors vs. Edinburgh Dec 27 Glasgow Warriors 4.40
4 Munster vs. Leinster Dec 27 Leinster -4.10
5 Ospreys vs. Scarlets Dec 27 Scarlets -0.10
6 Zebre vs. Treviso Dec 27 Treviso -7.50