Posts from July 2015 (36)

July 31, 2015

Doesn’t add up

Daniel Croft nominated a story on savings from switching power companies for Stat of the Week.  The story says

The latest Electricity Authority figures show 2.1 million consumers have switched providers since 2010, saving $164 on average for the year. In 2014, 385,596 households switched over, collectively saving $281 million.

and he argues that this level of saving without any real harm to the industry shows there was serious overcharging.  It turns out that there’s another reason the story is relevant to StatsChat. The savings number is wrong, and this is clear based on other numbers in the story.

A basic rule of numbers in journalism is that if you have two numbers, you can usually do arithmetic on them for some basic fact-checking.  Dividing $281 million by 385,596 gives an average saving of over $700 per switching household. I find that a bit hard to believe — it’s a lot bigger than the ads for whatsmynumber.org.nz suggest.

Looking at the end of the story, we can see average savings for people who switched in each region of New Zealand.  The highest is $318 for Bay of Plenty. It’s not possible for the national average to be more than twice the highest regional average. The numbers are wrong somewhere.

We can compare with the Electricity Authority report, which is supposed to be the source of the numbers.  The number 281 appears once in the document (ctrl-F is your friend):

If all households had switched to the cheapest deal in 2014 they collectively stood to save $281 million.

So, the $281 million total isn’t the estimated total saving for the 385,596 households who actually switched, it’s the estimated total saving if everyone switched to the cheapest available option — in fact, if they switched every month to the cheapest available option that month — and if they didn’t use more electricity once it was cheaper, and if prices didn’t increase to compensate.

All the quoted savings numbers are like this, averages over all households if they switched to the cheapest option, everything else being equal, rather than data on the actual switches of actual households.

 

Reproducibility and journalism

Today’s newspaper wants to tell you what is new today. It turns out that’s a problem.

Felix Salmon, at Fusion

But here’s the thing: it turns out that the scientific world is actually far, far ahead of the journalistic world on these matters. Yes, the world of online journalism is full of parasites, and a lot of those parasites have real value. But all that the parasites have to go on are published articles: no one is transparent about the process that created those articles. No one shows their work, and no one ever tries to replicate anything.

 

Briefly

  • Silk, a tool for publishing data graphics online
  • Figure.nz is a charity devoted to getting people to use data about New Zealand: “We do this by pulling together New Zealand’s public sector, private sector and academic data in one place and making it easy for people to use in simple graphical form for free through this website.”
July 29, 2015

NRL Predictions for Round 21

Team Ratings for Round 21

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 10.93 9.09 1.80
Broncos 9.29 4.03 5.30
Cowboys 8.18 9.52 -1.30
Rabbitohs 6.42 13.06 -6.60
Storm 5.12 4.36 0.80
Bulldogs 1.11 0.21 0.90
Sea Eagles 0.49 2.68 -2.20
Warriors -0.22 3.07 -3.30
Raiders -1.16 -7.09 5.90
Dragons -1.80 -1.74 -0.10
Sharks -1.93 -10.76 8.80
Panthers -3.77 3.69 -7.50
Eels -5.78 -7.19 1.40
Knights -6.37 -0.28 -6.10
Wests Tigers -8.78 -13.13 4.30
Titans -10.39 -8.20 -2.20

 

Performance So Far

So far there have been 144 matches played, 84 of which were correctly predicted, a success rate of 58.3%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Titans Jul 24 34 – 0 20.80 TRUE
2 Wests Tigers vs. Roosters Jul 24 8 – 33 -15.30 TRUE
3 Rabbitohs vs. Knights Jul 25 52 – 6 11.10 TRUE
4 Storm vs. Dragons Jul 25 22 – 4 8.60 TRUE
5 Warriors vs. Sea Eagles Jul 25 12 – 32 7.00 FALSE
6 Bulldogs vs. Sharks Jul 26 16 – 18 7.40 FALSE
7 Panthers vs. Raiders Jul 26 24 – 34 2.10 FALSE
8 Cowboys vs. Eels Jul 27 46 – 4 13.00 TRUE

 

Predictions for Round 21

Here are the predictions for Round 21. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Roosters vs. Bulldogs Jul 31 Roosters 12.80
2 Wests Tigers vs. Storm Jul 31 Storm -10.90
3 Warriors vs. Sharks Aug 01 Warriors 5.70
4 Cowboys vs. Raiders Aug 01 Cowboys 12.30
5 Sea Eagles vs. Broncos Aug 01 Broncos -5.80
6 Dragons vs. Knights Aug 02 Dragons 7.60
7 Rabbitohs vs. Panthers Aug 02 Rabbitohs 13.20
8 Titans vs. Eels Aug 03 Eels -1.60

 

Hadley Wickham

Dan Kopf from Priceonomics has written a nice article about one of Auckland’s famous graduates, Hadley Wickham. The article can be found Hadley Wickham.

July 28, 2015

Recreational genotyping: potentially creepy?

Two stories from this morning’s Twitter (via @kristinhenry)

  • 23andMe has made available a programming interface (API) so that you can access and integrate your genetic information using apps written by other people.  Someone wrote and published code that could be used to screen users based on sex and ancestry. (Buzzfeed, FastCompany). It’s not a real threat, since apps with more than 20 users need to be reviewed by 23andMe, and since users have to agree to let the code use their data, and since Facebook knows far more about you than 23andMe, but it’s not a good look.
  • Google’s Calico project also does cheap public genotyping and is combining their DNA data (more than a million people) with family trees from Ancestry.com. This is how genetic research used to be done: since we know how DNA is inherited, connecting people with family trees deep into the past provides a lot of extra information. On the other hand, it means that if a few distantly-related people sign up for Calico genotying, Google will learn a lot about the genomes of all their relatives.

It’s too early to tell whether the people who worry about this sort of thing will end up looking prophetic or just paranoid.

July 27, 2015

Cheat sheet on polling margin of error

The “margin of error” in a poll is the number you add and subtract to get a 95% confidence interval for the underlying proportion (under the simplest possible mathematical model for polling).  Pollers typically quote the “maximum margin of error”, which is the margin of error when the reported value is 50%. When the reported value is 0.7%, reporting the maximum margin of error (3.1%) is not helpful.  The Conservative Party is unpopular, but it’s not possible for them to have negative support, and not likely that they have nearly 4%.

Here is a cheat sheet, an expanded version of one I posted last year. The first column is the reported proportion and the remaining columns are the lower and upper ends of the 95% confidence interval for a sample of size 1000 (Here’s the code).   The Conservative Party interval is  (0.3%,1.4%), not (-2.4%, 3.8%).

       l    u
0.1  0.0  0.6
0.2  0.0  0.7
0.3  0.1  0.9
0.4  0.1  1.0
0.5  0.2  1.2
0.6  0.2  1.3
0.7  0.3  1.4
0.8  0.3  1.6
0.9  0.4  1.7
1.0  0.5  1.8
1.5  0.8  2.5
2.0  1.2  3.1
2.5  1.6  3.7
3.0  2.0  4.3
3.5  2.4  4.8
4.0  2.9  5.4
4.5  3.3  6.0
5.0  3.7  6.5
10   8.2 12.0
15  12.8 17.4
20  17.6 22.6
25  22.3 27.8
30  27.2 32.9
35  32.0 38.0
50  46.9 53.1

As you can see, the margin downwards is smaller than the margin upwards for small numbers (because you can’t have fewer than no supporters). By the time you get to 30% or so, the interval is pretty close to what you’d get with the maximum margin of error, but below 10% the maximum margin of error is seriously misleading.

You can get a reasonable approximation to these numbers by taking the number (not percent) of supporters (eg, 0.7% is 7 out of 1000), taking the square root, adding and subtracting 1, then squaring again: (then converting back into percent: ie, dividing by 10 for a poll of 1000).

    approx l approx u
0.1     0.00     0.40
0.2     0.02     0.58
0.3     0.05     0.75
0.4     0.10     0.90
0.5     0.15     1.05
0.6     0.21     1.19
0.7     0.27     1.33
0.8     0.33     1.47
0.9     0.40     1.60
1       0.47     1.73
1.5     0.83     2.37
2       1.21     2.99
2.5     1.60     3.60
3       2.00     4.20
3.5     2.42     4.78
4       2.84     5.36
4.5     3.26     5.94
5       3.69     6.51
10      8.10    12.10
15     12.65    17.55
20     17.27    22.93
25     21.94    28.26
30     26.64    33.56
35     31.36    38.84
50     45.63    54.57

which is pretty easy on a calculator, or with an Excel macro. For example, for 1000-person polls, if you put the reported percentage in the A1 cell, use =(sqrt(A1*10)-1)^2/10 and =(sqrt(A1*10)+1)^2/10

Briefly

    • Profile of Auckland Stats almnus Hadley Wickham at Priceonomics
    • The kiwi (Apteryx, not Actinidia) genome was recently sequenced by a non-NZ research group. There’s a push for NZ-led sequencing of nationally-significant genomes: a taonga genomes project
    • Linguist Jack Grieve (@JWGrieve) has been tweeting maps of various swearwords on (US) Twitter. These are relative to total number of tweets, so the don’t have the usual problem

  • From Jonathan Marshall, the age distribution of NZ electorates, and their political hue: there’s a clear trend, and Ilam seems a bit of an outlier.

 

 

 

Stat of the Week Competition: July 25 – 31 2015

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday July 31 2015.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of July 25 – 31 2015 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: July 25 – 31 2015

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!