Posts from March 2017 (37)

March 14, 2017

Super 18 Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Hurricanes 16.98 13.22 3.80
Chiefs 11.28 9.75 1.50
Highlanders 8.08 9.17 -1.10
Crusaders 8.08 8.75 -0.70
Lions 6.71 7.64 -0.90
Waratahs 2.92 5.81 -2.90
Brumbies 2.89 3.83 -0.90
Stormers 2.68 1.51 1.20
Sharks 2.05 0.42 1.60
Blues 0.88 -1.07 2.00
Bulls -0.76 0.29 -1.00
Jaguares -2.83 -4.36 1.50
Cheetahs -7.01 -7.36 0.40
Force -8.10 -9.45 1.40
Reds -9.14 -10.28 1.10
Rebels -12.37 -8.17 -4.20
Kings -19.01 -19.02 0.00
Sunwolves -20.42 -17.76 -2.70

 

Performance So Far

So far there have been 26 matches played, 18 of which were correctly predicted, a success rate of 69.2%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Chiefs vs. Hurricanes Mar 10 26 – 18 -3.60 FALSE
2 Brumbies vs. Force Mar 10 25 – 17 15.40 TRUE
3 Sharks vs. Waratahs Mar 10 37 – 14 0.40 TRUE
4 Blues vs. Highlanders Mar 11 12 – 16 -3.70 TRUE
5 Reds vs. Crusaders Mar 11 20 – 22 -14.70 TRUE
6 Cheetahs vs. Sunwolves Mar 11 38 – 31 18.80 TRUE
7 Kings vs. Stormers Mar 11 10 – 41 -16.50 TRUE
8 Jaguares vs. Lions Mar 11 36 – 24 -7.90 FALSE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Crusaders vs. Blues Mar 17 Crusaders 10.70
2 Rebels vs. Chiefs Mar 17 Chiefs -19.70
3 Bulls vs. Sunwolves Mar 17 Bulls 23.70
4 Hurricanes vs. Highlanders Mar 18 Hurricanes 12.40
5 Waratahs vs. Brumbies Mar 18 Waratahs 3.50
6 Lions vs. Reds Mar 18 Lions 19.80
7 Sharks vs. Kings Mar 18 Sharks 24.60
8 Jaguares vs. Cheetahs Mar 18 Jaguares 8.20

 

NRL Predictions for Round 3

Team Ratings for Round 3

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Storm 8.88 8.49 0.40
Cowboys 7.46 6.90 0.60
Sharks 7.40 5.84 1.60
Raiders 7.04 9.94 -2.90
Panthers 5.37 6.08 -0.70
Broncos 5.15 4.36 0.80
Eels 1.27 -0.81 2.10
Roosters 0.12 -1.17 1.30
Bulldogs -1.22 -1.34 0.10
Rabbitohs -1.70 -1.82 0.10
Titans -3.77 -0.98 -2.80
Wests Tigers -4.67 -3.89 -0.80
Sea Eagles -5.52 -2.98 -2.50
Dragons -5.91 -7.74 1.80
Warriors -7.31 -6.02 -1.30
Knights -14.65 -16.94 2.30

 

Performance So Far

So far there have been 16 matches played, 7 of which were correctly predicted, a success rate of 43.8%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Roosters vs. Bulldogs Mar 09 28 – 24 5.00 TRUE
2 Warriors vs. Storm Mar 10 10 – 26 -11.30 TRUE
3 Broncos vs. Cowboys Mar 10 20 – 21 1.70 FALSE
4 Knights vs. Titans Mar 11 34 – 26 -10.20 FALSE
5 Sea Eagles vs. Rabbitohs Mar 11 18 – 38 3.30 FALSE
6 Raiders vs. Sharks Mar 11 16 – 42 8.30 FALSE
7 Wests Tigers vs. Panthers Mar 12 2 – 36 -1.70 TRUE
8 Dragons vs. Eels Mar 12 16 – 34 -1.00 TRUE

 

Predictions for Round 3

Here are the predictions for Round 3. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Storm vs. Broncos Mar 16 Storm 7.20
2 Bulldogs vs. Warriors Mar 17 Bulldogs 10.10
3 Titans vs. Eels Mar 17 Eels -1.50
4 Knights vs. Rabbitohs Mar 18 Rabbitohs -9.50
5 Panthers vs. Roosters Mar 18 Panthers 8.80
6 Cowboys vs. Sea Eagles Mar 18 Cowboys 16.50
7 Raiders vs. Wests Tigers Mar 19 Raiders 15.20
8 Sharks vs. Dragons Mar 19 Sharks 16.80

 

March 13, 2017

But, fear itself

new research paper from Alastair Woodward and co-workers at the University of Auckland looks at the the risks of cycling in New Zealand. Jamie Morton at the Herald has written about it.  Basically, cycling isn’t as dangerous as you probably thought: the risk of an injury severe enough to report to ACC or to go to the emergency department is about one incident per 10,000 half-hour trips.  Or, for me, about once in 25-30 years.

There are two caveats for this as a pro-cycling message.  First, there’s some selection bias: the people who currently cycle are more likely to have safe routes available than those who currently don’t cycle — bike paths really work.  So if more people started cycling with the current infrastructure the ‘safety in numbers’ effect would be reduced by the increased use of dangerous roads.

Second,  it isn’t just actual injury that’s a problem.  The research paper talks about the social context of risk perception, and how the fact that cycling is regarded as weird makes the risks seem higher, which is true and an important factor. But. One morning recently, I stopped at the traffic lights coming off Grafton Bridge, and the bus behind me didn’t.  I didn’t come that close to being hit; It’s still not a fun way to start the day.  Russell Brown, who can actually write, covers this aspect better than I can.  He concludes

Cycling is much safer than people think. But until things change, fear of cycling will keep many reasonable people off the roads.

Stat of the Week Competition: March 11 – 17 2017

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 17 2017.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of March 11 – 17 2017 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: March 11 – 17 2017

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

March 12, 2017

Highchart of the week

C6mMjf5U8AEC794

It’s not a piechart, because the wedges don’t add up to anything, which is the only possible justification for a pie chart.  On the other hand, unlike the pizzachart it is trying to display numerical data.

Also, “51% of Americans have tried marijuana today” is presumably not the intended reading, but the graphic doesn’t make that as clear as it might.

And the source for the data isn’t a guy named Moe. That’s an abbreviation for Margin of Error.  Google suggests the source is a CBS News Poll (PDF report), but that’s from last year.

(via @seanjtaylor)

Briefly

  • False positives: many people who think they are allergic to penicillin actually aren’t, and so don’t need to be given broader-spectrum antibiotics (which have more impact on resistance). Ars Technica, the research paper.
  • Cancer genomics researcher accused of data falsification. Long NY Times story, including very clever animation of Western blot duplication.
  • A bill in the US House of Representatives wouldn’t quite let employers demand genetic data from employees, but it would let employers make employees pay not to give it. (STAT news)
  • President Trump described good employment numbers under the previous government as ‘phony’.  After the first month of his government, the White House press secretary said “They may have been phony in the past, but it’s very real now”.  (via Vox)
  • “Cause of death” is complicated: the BBC has a story “The biggest killer you may not know” about sepsis. The story says it “kills more people in the UK each year than bowel, breast and prostate cancer combined.” But it’s not either/or. A substantial number of sepsis deaths are due to cancer or cancer treatment.
  • Cathy O’Neil on how looking harder for crimes by any group (such as immigrants) is bound to increase the crime rate — if a spurious increase wasn’t the aim, you’d need to be careful about interpreting the data.
March 9, 2017

Causation, correlation, and gaps

It’s often hard to establish whether a correlation between two variables is cause and effect, or whether it’s due to other factors.  One technique that’s helpful for structuring one’s thinking about the problem is a causal graph: bubbles for variables, and arrows for effects.

I’ve written about the correlation between chocolate consumption and number of Nobel prizes for countries.  The ‘chocolate leads to Nobel Prizes’ hypothesis would be drawn like this:

chocolate

One of several more-reasonable alternatives is that variations in wealth explain the correlation, which looks like

chocolate1

As another example, there’s a negative correlation between the number of pirates operating in the world’s oceans and atmospheric CO2 concentration.  It could be that pirates directly reduce atmospheric CO2 concentration:

pirates

but it’s perhaps more likely that both technology and wealth have changed over time, leading to greater CO2 emissions and also to nations with the ability and motivation to suppress piracy:

pirates1

The pictures are oversimplified, but they still show enough of the key relationships to help with reasoning.  In particular, in these alternative explanations, there are arrows pointing into both the putative cause and the effect. There are arrows from the same origin into both ‘chocolate’ and ‘Nobel Prizes’; there are arrows from the same origins into both ‘pirates’ and ‘CO2‘.  Confounding — the confusion of relationships that leads to causes not matching correlations — requires arrows into both variables (or selection based on arrows out of both variables).

So, when we see a causal hypothesis like this one:

paygap

and ask if there’s “really” a gender pay gap, the answer “No” requires finding a variable with arrows into both gender and pay.  Which in your case you have not got. The pay gap really is caused by gender.

There are still interesting and important questions to be asked about mechanisms. For example, consider this graph

paygap1

We’d like to know how much of the pay gap is direct underpayment, how much goes through the mechanism of women doing more childcare, and how much goes through the mechanism of occupations with more women being  paid less.  Information about mechanisms helps us think about how to reduce the gap, and what the other costs of reducing it might be.  The studies I’ve seen suggest that all three of these mechanisms do contribute, so even if you think only the direct effects matter there’s still a problem.

You can also think of all sorts of things and stuff I’ve left out of that graph, and you could put some of them back in

paygap2

But you’re still going to end up with a graph where there are only arrows out of gender.  Women earn less, on average, and this is causation, not mere correlation.

March 8, 2017

Briefly

  • “Exploding boxplots”: although a boxplot is a lot better than just showing a mean, it’s usually worse than showing the data
  • The US state of Michigan used an automated system to detect unemployment benefit fraud. Late last year, an audit of 22427 cases of fraud overturned 93% of them! Now, a class-action lawsuit has been filed (PDF), giving (a one-sided view of) more of the details.
  • StatsChat has been saying for quite some time that people shouldn’t be making generalisations about road crash rates without evaluating the statistical evidence for increases or decreases.  It’s good to see someone doing the analysis: the Ministry of Transport has a big long report (PDF, from here) including (p37)[updated link]

    110. However, since 2013 the fatality rate has injury rate has begun to increase. We conducted statistical tests (Poisson) to see whether this increase was more than natural variation, and found strong evidence that the fatality and injury rates are actually rising.

  • Fascinating blog by John Grimwade, an infographics (as opposed to data visualisation) expert (via Kieran Healy)
  • “Not only does Google, the world’s preeminent index of information, tell its users that caramelizing onions takes “about 5 minutes”—it pulls that information from an article whose entire point was to tell people exactly the opposite.”  Another problem with Google’s new answer box, less serious than the claims about a communist coup in the US, but likely to be believed by more people.

Yes, November 19

trends

The graph is from a Google Trends search for  “International Men’s Day“.

There are two peaks. In the majority of years, the larger peak is on International Women’s Day, and the smaller peak is on the day itself.