Posts from June 2012 (33)

June 30, 2012

Drug statistics

As you will have heard, the UN says that Kiwis smoke more pot than anyone else.  The figure quoted, eg, by the Herald is 9.1%-14.6% (they don’t say what the range means).  It’s useful to look at the actual survey data, the NZ Alcohol and Drug Use survey, which is conveniently available online.

The proportion of New Zealanders who used cannabis at least weekly in the past year is about 5.6%, and the proportion who used it more than 1-2 times/week is 3.8%.  For tobacco, the corresponding figure for more than 1-2 times per week is about 20% and for alcohol, 26%.

You can find statistics broken down by age, sex, ethnicity, frequency of use, and indicators of dependency (yes, people do get addicted to cannabis, just like to other drugs).

June 28, 2012

Open data discussions

The political/social science/etc blog Crooked Timber is having a seminar on Open Data, which is highly recommended.  In one of the posts, Steven Berlin Johnson writes about the mid-19th century efforts of William Farr to publish more information on causes of death, information later used by John Snow in attributing cholera to contaminated water.

He concludes:

Yes, information abundance meant that the newspapers lost their local advertising monopolies to Craigslist and Groupon, but it also means that the crucial data they used to have to unearth by hanging around City Hall for months is now available to anyone with a Web browser or an API key. We may well have fewer investigative journalists on the payroll of newspapers, but if we play our Open Data cards right, we might well end up with more investigations.

 

[Update: these are now all collected in a post and PDF and epub]

Alpine fault: can we panic now?

The Herald has a good report of research to be published in Science tomorrow, studying earthquakes on the Alpine fault.  By looking at a river site where quakes disrupted water flow, interrupting peat deposition with a layer of sediment, the researchers could get a history of large quakes going back 8000 years. They don’t know exactly how big any of the quakes were, but they were big enough to rupture the surface and affect water flow, so at the least they would mess up roads and bridges, and disrupt tourism.

Based on this 8000-year history, it seems that the Alpine fault is relatively regular in how often it has earthquakes: more so than the San Andreas Fault in California, for example.  Since the fault has major earthquakes about every 330 years, and the most recent one was 295 years ago, it’s likely to go off soon.  Of course, ‘soon’ here doesn’t mean “before the Super 15 final”; the time scales are a bit longer than that.

We can look at some graphs to get a rough idea of the risk over different time scales.  I’m going to roughly approximate the distribution of the times between earthquakes by a log-normal distribution, that is, the logarithm of the times has a Normal distribution.

This is a simple and reasonable model for time intervals, and it also has the virtue of giving the same answers that the researchers gave to the press.  Using the estimates of mean and variation in the paper, the distribution of times to the next big quake looks like the first graph.  The quake is relatively predictable, but “relatively” in this sense means “give or take a century”.

Now, by definition, the next big quake hasn’t happened yet, so we can throw away the part of this distribution that’s less than zero, and rescale the distribution so it still adds up to 1, getting the second graph.  The chance of a big quake is a bit less than 1% per year — not very high, but certainly worth doing something about.  For comparison, it’s about 2-3 times the risk per year of being diagnosed with breast cancer for middle-aged women.

The Herald article (and the press release) quote a 30% chance over 50 years, which matches this lognormal model.  At 80 years there’s a roughly 50:50 chance, and if we wait long enough the quake has to happen eventually.

The risk of a major quake in any given year isn’t all that high, but the problem isn’t going away and the quake is going to make a serious mess when it happens.

 

 

[Update: Stuff also has an article. They quote me  (via the Science Media Centre) , but I’m just describing one of the graphs in the paper: Figure 3B, if you want to go and check that I can do simple arithmetic]

NRL Predictions, Round 17

Team Ratings for Round 17

Here are the team ratings prior to Round 17, along with the ratings at the start of the season. I have created a brief description of the method I use for predicting rugby games. Go to my Department home page to see this.

Current Rating Rating at Season Start Difference
Storm 9.09 4.63 4.50
Broncos 6.17 5.57 0.60
Bulldogs 6.13 -1.86 8.00
Sea Eagles 5.70 9.83 -4.10
Cowboys 3.01 -1.32 4.30
Warriors 2.44 5.28 -2.80
Wests Tigers 1.59 4.52 -2.90
Rabbitohs 1.28 0.04 1.20
Sharks -1.95 -7.97 6.00
Dragons -2.07 4.36 -6.40
Titans -2.35 -11.80 9.50
Knights -4.39 0.77 -5.20
Roosters -5.28 0.25 -5.50
Panthers -7.11 -3.40 -3.70
Eels -7.92 -4.23 -3.70
Raiders -8.08 -8.40 0.30

 

Performance So Far

So far there have been 117 matches played, 70 of which were correctly predicted, a success rate of 59.83%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Dragons vs. Titans Jun 22 8 – 6 5.31 TRUE
2 Broncos vs. Rabbitohs Jun 23 26 – 12 8.52 TRUE
3 Cowboys vs. Raiders Jun 23 40 – 18 14.36 TRUE
4 Panthers vs. Eels Jun 23 18 – 19 6.52 FALSE
5 Bulldogs vs. Storm Jun 24 20 – 4 -6.58 FALSE
6 Roosters vs. Sea Eagles Jun 24 14 – 52 -0.47 TRUE
7 Knights vs. Wests Tigers Jun 25 38 – 20 -5.19 FALSE

 

Predictions for Round 17

Here are the predictions for Round 17

Game Date Winner Prediction
1 Broncos vs. Sharks Jun 29 Broncos 12.60
2 Eels vs. Knights Jun 30 Eels 1.00
3 Warriors vs. Cowboys Jul 01 Warriors 3.90
4 Rabbitohs vs. Panthers Jul 01 Rabbitohs 12.90
5 Raiders vs. Dragons Jul 02 Dragons -1.50

 

Super 15 Predictions, Week 19

Team Ratings for Week 19

Here are the team ratings prior to Week 19, along with the ratings at the start of the season. I have created a brief description of the method I use for predicting rugby games. Go to my Department home page to see this.

Current Rating Rating at Season Start Difference
Crusaders 10.50 10.46 0.00
Stormers 6.62 6.59 0.00
Bulls 4.53 4.16 0.40
Chiefs 2.99 -1.17 4.20
Sharks 2.65 0.87 1.80
Hurricanes 2.49 -1.90 4.40
Reds 0.72 5.03 -4.30
Brumbies -0.01 -6.66 6.60
Waratahs -2.01 4.98 -7.00
Cheetahs -2.92 -1.46 -1.50
Highlanders -3.06 -5.69 2.60
Blues -5.97 2.87 -8.80
Force -8.17 -4.95 -3.20
Lions -9.21 -10.82 1.60
Rebels -12.45 -15.64 3.20

 

Performance So Far

So far there have been 100 matches played, 69 of which were correctly predicted, a success rate of 69%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Crusaders vs. Highlanders Jun 01 51 – 18 15.20 TRUE
2 Rebels vs. Brumbies Jun 01 19 – 27 -7.90 TRUE
3 Blues vs. Chiefs Jun 02 34 – 41 -4.00 TRUE
4 Waratahs vs. Hurricanes Jun 02 12 – 33 4.00 FALSE
5 Lions vs. Sharks Jun 02 38 – 28 -10.70 FALSE
6 Bulls vs. Stormers Jun 02 14 – 19 3.80 FALSE

 

Predictions for Week 19

Here are the predictions for Week 19. The prediction is my estimated points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Chiefs Jun 29 Chiefs -1.50
2 Rebels vs. Reds Jun 29 Reds -8.70
3 Crusaders vs. Hurricanes Jun 30 Crusaders 12.50
4 Force vs. Brumbies Jun 30 Brumbies -3.70
5 Stormers vs. Lions Jun 30 Stormers 20.30
6 Bulls vs. Cheetahs Jun 30 Bulls 11.90

 
 

June 25, 2012

Stat of the Week Competition: June 23-29 2012

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday June 29 2012.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of June 23-29 2012 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.
(more…)

Stat of the Week Competition Discussion: June 23-29 2012

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

June 22, 2012

NRL Predictions, Round 16

Team Ratings for Round 16

Here are the team ratings prior to Round 16, along with the ratings at the start of the season. I have created a brief description of the method I use for predicting rugby games. Go to my Department home page to see this.

Current Rating Rating at Season Start Difference
Storm 10.90 4.63 6.30
Broncos 5.74 5.57 0.20
Bulldogs 4.32 -1.86 6.20
Wests Tigers 3.44 4.52 -1.10
Sea Eagles 2.69 9.83 -7.10
Warriors 2.44 5.28 -2.80
Cowboys 2.40 -1.32 3.70
Rabbitohs 1.72 0.04 1.70
Dragons -1.81 4.36 -6.20
Sharks -1.95 -7.97 6.00
Roosters -2.28 0.25 -2.50
Titans -2.62 -11.80 9.20
Knights -6.25 0.77 -7.00
Panthers -6.51 -3.40 -3.10
Raiders -7.47 -8.40 0.90
Eels -8.53 -4.23 -4.30

 

Performance So Far

So far there have been 110 matches played, 66 of which were correctly predicted, a success rate of 60%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Dragons vs. Bulldogs Jun 15 20 – 28 -0.41 TRUE
2 Cowboys vs. Broncos Jun 15 12 – 0 -0.90 FALSE
3 Sharks vs. Warriors Jun 16 20 – 19 -0.06 FALSE
4 Eels vs. Rabbitohs Jun 16 6 – 24 -8.76 TRUE
5 Titans vs. Panthers Jun 17 36 – 18 6.56 TRUE
6 Wests Tigers vs. Roosters Jun 17 28 – 42 14.83 FALSE
7 Sea Eagles vs. Storm Jun 18 22 – 26 -3.65 TRUE

 

Predictions for Round 16

Here are the predictions for Round 16

Game Date Winner Prediction
1 Dragons vs. Titans Jun 22 Dragons 5.30
2 Broncos vs. Rabbitohs Jun 22 Broncos 8.50
3 Cowboys vs. Raiders Jun 23 Cowboys 14.40
4 Panthers vs. Eels Jun 23 Panthers 6.50
5 Bulldogs vs. Storm Jun 24 Storm -6.60
6 Roosters vs. Sea Eagles Jun 24 Sea Eagles -0.50
7 Knights vs. Wests Tigers Jun 25 Wests Tigers -5.20

 

 

June 21, 2012

Why Nigeria?

A fairly large fraction of the spam advertising the chance to capitalise on stolen riches is sent by Nigerian criminals.  What should be more surprising is the fact that a lot of the spam actually says it’s from Nigeria (or other West African nations).  Since everyone knows about Nigerian scams, why don’t the spammers claim to be from somewhere else? It’s not as if they have an aversion to lying about other things.

A new paper from Cormac Herley at Microsoft Research has a statistically-interesting explanation:  the largest cost in spam operations is in dealing with the people who respond to the first email.  Some of these people later realise what’s going on and drop out without paying; from the spammer’s point of view these are false positives — they cost time and money to handle, but don’t end up paying off.   A spammer ideally wants only to engage with the most gullible potential victims; the fact that ‘Nigeria’ will spark suspicions in many people is actually a feature, not a bug.

 

If it’s not worth doing, it’s not worth doing well?

League tables work well in sports.  The way the competition is defined means that ‘games won’ really is the dominant factor in ordering teams,  it matters who is at the top, and people don’t try to use the table for inappropriate purposes such as deciding which team to support.  For schools and hospitals, not so much.

The main problems with league tables for schools (as proposed in NZ) or hospitals (as implemented in the UK) are, first, that a ranking requires you to choose a way of collapsing multidimensional information into a rank, and second, that there is usually massive uncertainty in the ranking, which is hard to convey.   There doesn’t have to be one school in NZ that is better than all the others, but there does have to be one school at the top of the table.  None of this is new: we have looked at the problems of collapsing multidimensional information before, with rankings of US law schools, and the uncertainty problem with rates of bowel cancer across UK local government areas.

This isn’t to say that school performance data shouldn’t be used.  Reporting back to schools how they are doing, and how it compares to other similar schools, is valuable.  My first professional software development project (for my mother) was writing a program (in BASIC, driving an Epson dot-matrix printer) to automate the reports to hospitals from the Victorian Perinatal Data Collection Unit.  The idea was to give each hospital the statewide box plots of risk factors (teenagers, no ante-natal care), adverse outcomes (deaths, preterm births, malformations), and interventions (induction of labor, caesarean section), with their own data highlighted by a line.   Many of the adverse outcomes were not the hospital’s fault, and many of the interventions could be either positive or negative depending on the circumstances, so collapsing to a single ‘hospital quality’ score would be silly, but it was still useful for hospitals to know how they compare.  In that case the data was sent only to the hospital, but for school data there’s a good argument for making it public.

While it’s easy to see why teachers might be suspicious of the government’s intentions, the rationale given by John Key for exploring some form of official league table is sensible.  It’s definitely better not to have a simple ranking, and it might arguably be better not to have a set of official comparative reports, but the data are available under the Official Information Act.  The media may currently be shocked and appalled at the idea of league tables, but does anyone really believe this would stop a plague of incomplete, badly-analyzed, sensationally-reported exposés of “New Zealand’s Worst Schools!!”?  It would be much better for the Department of Education to produce useful summaries, preferably not including a league-table ranking, as a prophylactic measure.