Posts from January 2020 (20)

January 29, 2020

Pro14 Predictions for Round 9 Delayed Match

Team Ratings for Round 9 Delayed Match

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 16.11 12.20 3.90
Munster 7.81 10.73 -2.90
Glasgow Warriors 6.69 9.66 -3.00
Ulster 5.09 1.89 3.20
Edinburgh 4.82 1.24 3.60
Scarlets 3.47 3.91 -0.40
Connacht 0.45 2.68 -2.20
Cardiff Blues 0.41 0.54 -0.10
Cheetahs -0.95 -3.38 2.40
Ospreys -3.65 2.80 -6.50
Treviso -4.04 -1.33 -2.70
Dragons -8.38 -9.31 0.90
Southern Kings -13.32 -14.70 1.40
Zebre -14.51 -16.93 2.40

 

Performance So Far

So far there have been 69 matches played, 53 of which were correctly predicted, a success rate of 76.8%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Southern Kings vs. Cheetahs Jan 26 30 – 31 -8.80 TRUE

 

Predictions for Round 9 Delayed Match

Here are the predictions for Round 9 Delayed Match. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Cheetahs vs. Southern Kings Feb 02 Cheetahs 17.40

 

Rugby Premiership Predictions for Round 10

Team Ratings for Round 10

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Saracens 9.84 9.34 0.50
Exeter Chiefs 7.53 7.99 -0.50
Sale Sharks 3.88 0.17 3.70
Northampton Saints 1.67 0.25 1.40
Gloucester 1.37 0.58 0.80
Bath 0.16 1.10 -0.90
Harlequins -1.06 -0.81 -0.30
Bristol -2.00 -2.77 0.80
Wasps -2.00 0.31 -2.30
Leicester Tigers -2.99 -1.76 -1.20
Worcester Warriors -4.94 -2.69 -2.30
London Irish -5.24 -5.51 0.30

 

Performance So Far

So far there have been 54 matches played, 37 of which were correctly predicted, a success rate of 68.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Bath vs. Leicester Tigers Jan 25 13 – 10 8.30 TRUE
2 Bristol vs. Gloucester Jan 25 34 – 16 -0.80 FALSE
3 Exeter Chiefs vs. Sale Sharks Jan 25 19 – 22 9.50 FALSE
4 Harlequins vs. Saracens Jan 25 41 – 14 -9.80 FALSE
5 Northampton Saints vs. London Irish Jan 25 16 – 20 13.20 FALSE
6 Worcester Warriors vs. Wasps Jan 25 26 – 30 2.30 FALSE

 

Predictions for Round 10

Here are the predictions for Round 10. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Gloucester vs. Exeter Chiefs Feb 15 Exeter Chiefs -1.70
2 Harlequins vs. London Irish Feb 15 Harlequins 8.70
3 Leicester Tigers vs. Wasps Feb 15 Leicester Tigers 3.50
4 Northampton Saints vs. Bristol Feb 15 Northampton Saints 8.20
5 Saracens vs. Sale Sharks Feb 15 Saracens 10.50
6 Worcester Warriors vs. Bath Feb 15 Bath -0.60

 

Briefly

January 26, 2020

Coronavirus news

One reputable source of moderately technical information about the new coronavirus is the MRC Centre for Global Infectious Disease Analysis, at Imperial College, London.  They’ve worked on outbreak modelling and control advice for a long time, across a wide range of epidemics (including both previous coronavirus outbreaks: SARS and MERS).

Their latest (third) report says that it’s clear there has been sustained person-to-person transmission — a spreading epidemic — in China, and that basically nothing else is clear.

From the Discussion section

Whether transmission continues at the same rate now critically depends on the effectiveness of the intense control effort now underway in Wuhan and across China. We note the large body of evidence that suggests that the reproduction number for SARS changed considerably when populations became fully aware of the threat. If a similar change to contact patterns is occurring in this outbreak, rates of transmission are likely to be lower now than during the period for which these estimates were made, due to control measures and risk avoidance in the population. Whether the reduction in transmission is sufficient to reduce R to below 1–and thus end the outbreak –remains to be seen. Reports point to mildly symptomatic but infectious cases of 2019-nCoV, which were not a feature of SARS. Prompt detection and isolation of such cases will be extremely challenging, given the larger number of other diseases (e.g. influenza) which can cause such non-specific respiratory symptoms.While more severe cases will always need to be prioritised, control may depend upon successful detection, testing and isolation of suspect cases with the broadest possible range of symptom severity.Our results emphasise the need to track transmission rates over the next few weeks, especially in Wuhan. If a clear downwards trend is observed in the numbers of new cases, that would indicate that control measures and behavioural changes can substantially reduce the transmissibility of 2019-nCoV. Genetic data from Wuhan after the implementation of strong public health measures may also provide valuable insight into the patterns and rate of transmission.

Despite the recent decision of the WHO Emergency Committee to not declare this a Public Health Emergency of International Concern at this time, this epidemic represents a clear and ongoing global health threat. It is uncertain at the current time whether it is possible to contain the continuing epidemic within China. In addition to monitoring how the epidemic evolves, it is critical that the magnitude of the threat is better understood. Currently, we have only a limited understanding of the spectrum of severity of symptoms that infection with this virus causes, and no reliable estimates of the case fatality ratio –the proportion of cases who will die as a result of the disease. Characterising the severity spectrum, and how severity of symptoms relates to infectiousness, will be critical to evaluating the feasibility of control and the likely public health impact of this epidemic.

When they talk about ‘containing’ the epidemic within China, they don’t mean whether or not there are cases outside China — there are already are — they mean whether or not there’s sustained transmission from person to person outside China.

Not all algorithms wear computers

Via maths teacher Twitter, two graphs, from the economics PhD research of Cody Tuttle, showing data from the United States Sentencing Commission on recorded drug amounts in federal drug cases.

The first one shows amounts of crack cocaine 1990-2010

The second shows crack cocaine for the years 2011-2015

For some reason, people suddenly started getting caught with 280g of crack in 2011. Now, 280g is a rounder number than it looks — it’s 20 times 14g, or in other words, 10 ounces.  Even so, you’d wonder why 10oz loads of cocaine suddenly started being popular.

It turns out that there was a change in the law. Up to 2010, a relic of the ‘war on drugs’ period meant there was a mandatory 10-year minimum sentence for more than 50g. From 2011, the threshold for the mandatory minimum was 280g.  Suddenly, the proportion of people convicted of having 280-290g shot up.  Further graphs and analyses show that the increase was much more pronounced for Black and Hispanic defendants than White.

Interestingly, the paper says However, the data on drug seizures made by local and federal agencies do not show increased bunching at 280g after 2010.” The conclusion reached in the analysis is that a substantial minority of prosecutors, who have some discretion in deciding what quantity of drugs to list in the charges, misused this discretion. 

The analysis is a good example of the sort of auditing you’d like for high-stakes computer algorithms, and it shows how you can bias the outputs of a decision-making system (such as the court) by biasing the data you feed it.

One of the advantages of computerised algorithms is that this sort of auditing is much easier (in principle). It’s because you can’t force the US Federal court system to run on your choice of simulated data that you need to rely on ‘natural experiments’ like this one.

January 23, 2020

Gender pay gaps

New Zealand and international media are reporting an new analysis of the gender pay gap among NZ academics. At one level this isn’t anything very surprising: there’s a gender pay gap, of the same percentage order of magnitude as in NZ as a whole (larger in Medicine, smaller in Arts).

As I’ve pointed out before, we know this is caused by gender, it’s not just some sort of correlation caused by confounding factors, since there aren’t any. What’s interesting is how it is that women come to be paid less. You could imagine a range of direct mechanisms:

  • slower promotion
  • lower pay at the same grade
  • less likely to be head of department/school
  • more likely to be at institutions where pay is lower
  • more likely to be in fields where pay is lower

And you could imagine possible factors leading into these

  • lower research ability
  • lower average age, because of past discrimination
  • interested in putting more effort into teaching or into service
  • pushed into putting more effort into teaching or into service
  • interested in putting more effort into childcare
  • pushed into putting more effort into childcare
  • discrimination in salary assignments
  • discrimination in promotion

and so on.

While many people have more or less informed opinions about these mechanisms, it’s often hard to get good data.  The research (by Associate Professors Ann Brower and Alex James of the University of Canterbury) takes advantage of the 2018 PBRF evaluations of NZ academics.  These evaluations were based on research portfolios selected to show the best research from each person (quality rather than quantity) and were evaluated by panels of NZ and overseas experts in each field.

In this paper, Brower and James got access to PBRF ratings and salary data for NZ academics, and so could look at whether women of similar age with similar PBRF scores had similar pay. As will surely astonish you, they didn’t.  In particular, it appears that women are less likely to be promoted to Associate Professor and Professor, with similar PBRF ratings, that men are.  Differences in age distribution and research performance explain about half the gender pay gap; the other half remains.

The big limitation of any analysis of this sort is the quality of the performance data.  If performance is measured poorly, then even if it really does completely explain the outcomes, it will look as if there’s a unexplained gap.  The point of this paper is that PBRF is quite a good measurement of research performance: assessed by scientists in each field, by panels convened with at least some attention to gender representation, using individual, up-to-date information.  If you believed that PBRF was pretty random and unreliable, you wouldn’t be impressed by these analyses: if PBRF scores don’t describe research performance well, they can’t explain its effect on pay and promotion well.

There could be bias in the other direction, too.  Suppose PBRF were biased in favour of men, and promotions were biased in favour of men in exactly the same way.  Adjusting for PBRF would then completely reproduce the bias in promotion, and make it look as if pay was completely fair.

Now, I’m potentially biased, since I was on a PBRF panel in 2013 (and since I got a good PBRF score), but I think PBRF is a fairly good assessment. I think the true residual pay gap could easily be quite a bit smaller or larger than this analysis estimates, but it’s as good as you’re likely to be able to do, and it certainly does not support the view that the pay gap is zero.

What does that mean?

There’s a nice piece on Stuff about earthquake risk in New Zealand (basically, the geology is out to get us, and we should be prepared).

It includes this map, which comes from “SUPPLIED”

I wondered what the risk numbers (0.15, 0.3) actually meant.  Are they some kind of probability of a quake? Over what period of time?

It’s surprisingly hard to find out.  The first step is easy: the numbers are seismic risk factors used in the building code, eg, see this map from Radio NZ in 2017

Searching a bit more, you can readily find that (eg, at building.govt.nz) these are the “Z-values to determine seismic risk”, and that there’s

  • a low seismic risk if the area has a Z factor that is less than 0.15; and
  • a medium seismic risk if the area has a Z factor that is greater than or equal to 0.15 and less than 0.3; and
  • a high seismic risk if the area has a Z factor that is greater than or equal to 0.3.

This isn’t getting us much further forward, but there is a reference to a Standard.  Now, there are (for some reason) serious penalties over and above the copyright law for being too explicit about the contents of a standard, but you can go and read it for yourself, and verify that Z is a hazard factor that you look up in a table, and that it gets combined with other information about soil and so on to give you a number that goes into how strong your building needs to be. But there isn’t any more explicit explanation of what a Z is.

Searching further, I found a research paper which describes the Australian standards as having a Z that looks very much like the NZ one, defined as the “effective peak ground acceleration with a return period of 500 years”. So, it’s not the probability of a quake, it’s the intensity of the largest quake expected over a 500 year period, in units of the acceleration due to gravity.  The US also has a Z with the same definition, though they probably say it ‘zee’.

So, two points: first, it shouldn’t be this much work to find out what the numbers mean on a map published on a major news website. Second, I’m not convinced that ‘seismic risk’ is a good name for this thing, since ‘seismic risk’ sounds more like it should involve probabilities.

January 22, 2020

They say the neon lights are bright

The Herald (on Twitter, today)

An international study has revealed why Auckland has been ranked low for liveability compared to other cities in New Zealand and the rest of the world.

Stuff (March last year)

The City of Sails continues to be ranked the world’s third most liveable city for quality of life.

Phil Goff, as Mayor of Auckland, obviously prefers the pro-Auckland survey and the Herald said he ‘rubbished’ the other survey, saying it “defies all the evidence which shows Auckland is a growing, highly popular city for all people.”

There’s no need for one survey to be wrong and the other one right, though. It depends on what you’re looking for.  Mercer’s rankings (where Auckland does well) are aimed at companies moving employees to overseas postings. One of their intended uses is to work out how much extra executives need to be paid in compensation for living in, say, Houston or Birmingham rather than Auckland or Vancouver.   Cost of living doesn’t factor into this; it’s quality of life if you’re rich enough.

Some of the quality of life features (good climate, safe streets, lack of air pollution) can be enjoyed by most people, but some really are more relevant if you’ve got money.

The Movinga ranking that the Herald quotes is about suitability for families.  Two important components are housing costs, and total cost of living, in terms of local incomes.  Auckland does fairly badly there, and also on a public transport/road congestion component.  The ranking of 94 out of 150 overstates the issue a bit: here’s the distribution of scores, with Auckland in yellow. Auckland is part of the main clump, where the rankings will be sensitive to exactly how each component is weighted.

There are important reasons to want to live in a particular city that neither ranking considers.  An obvious one is employment or business opportunities.  If your speciality is teaching statistics or installing HVAC in tall buildings or running a Shanxi restaurant, you’ll probably do better in Auckland than in New Plymouth.  Movinga also rates cities on suitability for entrepreneurs, jobseekers, and as places to find love (Auckland is 34th out of 100 on that last one).

 

January 21, 2020

Can you win a million dollars?

As Betteridge’s Law of Headlines says, the correct response to any headline ending in a question mark is “No”.

From the Herald: “Kogan Mobile – a relative newcomer to NZ – is offering $1 million if you can correctly pick the result of every game in the first six rounds of the new Super Rugby seasons.”

This is a good deal for Kogan Mobile. They get coverage in the Herald and on NewstalkZB (albeit at 5:20am), and they get a lot of email addresses, many of which will be genuine.  There will be people who hadn’t heard of the company last week who now have heard of them.

I shouldn’t think the publicity is worth a million dollars, but they’re very unlikely to have to pay out.  As I told Chris Keall and Kate Hawkesby, if you typically average 2 out of 3 correct predictions (roughly what StatsChat’s David Scott gets), you’ve got one chance in ten million of getting forty correct predictions — to be precise, (2/3)40

It’s actually a bit worse than that. David gets two out of three correct, but there are easy picks and hard picks, and he does better on the easy ones than the hard ones.   Variation in difficult makes it harder to get all the picks right  — imagine someone who was 100% wrong whenever the Hurricanes played the Chiefs; they could still average two out of three, but they couldn’t get 40 out of 40.

The variation in difficulty doesn’t make that much difference. Suppose you had a 50:50 chance for 29 of the games and the other 11 were so easy you had a 100% chance. That averages close to 2/3, and the variation is more extreme than really plausible, but the chance of getting all forty correct is (0.5)29×(1)11, or one in 500 million, only lower by a factor of 50.

If Kogan Mobile get a million entries, which seems quite a lot for New Zealand, they’d still have probably less than one chance in ten of paying out.   They might have decided to wear that chance, or they might have bought insurance — in 2003, when Pepsi offered a billion dollar lottery-style prize, they insured the risk of paying out, for less than $10 million.

So should you play? Sure, if you feel like it.  You just shouldn’t expect much of a chance of winning.

Super Rugby Predictions for Round 1

Team Ratings for Round 1

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 17.10 17.10 -0.00
Hurricanes 8.79 8.79 -0.00
Jaguares 7.23 7.23 0.00
Chiefs 5.91 5.91 0.00
Highlanders 4.53 4.53 0.00
Brumbies 2.01 2.01 -0.00
Bulls 1.28 1.28 0.00
Lions 0.39 0.39 -0.00
Blues -0.04 -0.04 -0.00
Stormers -0.71 -0.71 -0.00
Sharks -0.87 -0.87 -0.00
Waratahs -2.48 -2.48 -0.00
Reds -5.86 -5.86 0.00
Rebels -7.84 -7.84 0.00
Sunwolves -18.45 -18.45 0.00

 

Predictions for Round 1

Here are the predictions for Round 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Blues vs. Chiefs Jan 31 Chiefs -1.40
2 Brumbies vs. Reds Jan 31 Brumbies 12.40
3 Sharks vs. Bulls Jan 31 Sharks 2.40
4 Sunwolves vs. Rebels Feb 01 Rebels -4.60
5 Crusaders vs. Waratahs Feb 01 Crusaders 25.60
6 Stormers vs. Hurricanes Feb 01 Hurricanes -3.50
7 Jaguares vs. Lions Feb 01 Jaguares 12.80