Posts from April 2016 (34)

April 17, 2016

Overcounting causes

There’s a long story in the Sunday Star-Times about a 2007 report on cannabis from the National Drug Intelligence Bureau (NDIB)

“Perhaps surprisingly,” Maxwell wrote, “cannabis related hospital admissions between 2001 and 2005 exceeded admissions for opiates, amphetamines and cocaine combined”, with about 2000 people a year ending up in hospital because of the drug.

The problem was with hospital diagnostic codes. Discharge summaries include both the primary cause of admission and a lot of other things to be noted. That’s a good thing — you want to know what all was wrong with a patient both for future clinical care and for research and quality control.  For example, if someone is in hospital for bleeding, you want to know they were on warfarin (which is why the bleeding happened), and perhaps why they were on warfarin. It’s not even always the case that the primary cause is the primary cause — if someone has Parkinson’s Disease and is admitted with pneumonia as a complication, which one should be listed? This is a difficult and complex field, and is even slightly less boring than it sounds.

As a result, if you just count up all the discharge summaries where ‘cannabis dependence’ was somewhere on the laundry list of codes, you’re going to get a lot of people who smoke pot but are in hospital for some completely different reason.  And since there’s a lot of cannabis consumption out there, you will get a lot of these false positives.

There are some other things to note about this report, though. The National Drug Foundation says (on Twitter) that they made the same point when it first came out. They also claim


that the Ministry of Health argued against its being published.

Perhaps now the multiple-counting problem has been publicised in the context of hospital admissions the same mistake will be made less often for road crashes, where multiple factors from foreign drivers to speed to alcohol to drugs are repeatedly counted up as ‘the’ cause of any crash where they are present.

April 13, 2016

Super 18 Predictions for Round 8

Team Ratings for Round 8

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.12 9.84 -1.70
Chiefs 6.51 2.68 3.80
Highlanders 6.25 6.80 -0.50
Hurricanes 5.62 7.26 -1.60
Brumbies 3.47 3.15 0.30
Waratahs 2.30 4.88 -2.60
Stormers 2.02 -0.62 2.60
Lions 0.42 -1.80 2.20
Bulls -0.72 -0.74 0.00
Sharks -1.06 -1.64 0.60
Blues -4.16 -5.51 1.30
Rebels -5.31 -6.33 1.00
Jaguares -8.67 -10.00 1.30
Reds -9.17 -9.81 0.60
Cheetahs -9.21 -9.27 0.10
Force -9.31 -8.43 -0.90
Sunwolves -12.98 -10.00 -3.00
Kings -17.37 -13.66 -3.70

 

Performance So Far

So far there have been 55 matches played, 36 of which were correctly predicted, a success rate of 65.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Chiefs vs. Blues Apr 08 29 – 23 15.30 TRUE
2 Force vs. Crusaders Apr 08 19 – 20 -15.10 TRUE
3 Stormers vs. Sunwolves Apr 08 46 – 19 17.90 TRUE
4 Hurricanes vs. Jaguares Apr 09 40 – 22 18.30 TRUE
5 Reds vs. Highlanders Apr 09 28 – 27 -13.10 FALSE
6 Sharks vs. Lions Apr 09 9 – 24 4.30 FALSE
7 Kings vs. Bulls Apr 09 6 – 38 -10.60 TRUE

 

Predictions for Round 8

Here are the predictions for Round 8. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Crusaders vs. Jaguares Apr 15 Crusaders 20.80
2 Rebels vs. Hurricanes Apr 15 Hurricanes -6.90
3 Cheetahs vs. Sunwolves Apr 15 Cheetahs 7.80
4 Blues vs. Sharks Apr 16 Blues 0.90
5 Waratahs vs. Brumbies Apr 16 Waratahs 2.30
6 Bulls vs. Reds Apr 16 Bulls 12.40
7 Lions vs. Stormers Apr 16 Lions 1.90

 

NRL Predictions for Round 7

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 11.97 10.29 1.70
Broncos 9.31 9.81 -0.50
Roosters 3.73 11.20 -7.50
Bulldogs 2.93 1.50 1.40
Storm 2.10 4.41 -2.30
Sharks 1.20 -1.06 2.30
Rabbitohs 1.06 -1.20 2.30
Eels 0.10 -4.62 4.70
Sea Eagles 0.08 0.36 -0.30
Panthers -1.29 -3.06 1.80
Raiders -1.73 -0.55 -1.20
Dragons -3.51 -0.10 -3.40
Titans -4.93 -8.39 3.50
Wests Tigers -5.25 -4.06 -1.20
Warriors -5.74 -7.47 1.70
Knights -8.36 -5.41 -2.90

 

Performance So Far

So far there have been 48 matches played, 25 of which were correctly predicted, a success rate of 52.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Dragons Apr 07 26 – 0 14.10 TRUE
2 Rabbitohs vs. Roosters Apr 08 10 – 17 1.60 FALSE
3 Eels vs. Raiders Apr 09 36 – 6 0.90 TRUE
4 Warriors vs. Sea Eagles Apr 09 18 – 34 0.50 FALSE
5 Panthers vs. Cowboys Apr 09 18 – 23 -11.20 TRUE
6 Sharks vs. Titans Apr 10 25 – 20 9.90 TRUE
7 Knights vs. Wests Tigers Apr 10 18 – 16 -0.50 FALSE
8 Storm vs. Bulldogs Apr 11 12 – 18 3.50 FALSE

 

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Eels Apr 14 Sea Eagles 3.00
2 Cowboys vs. Rabbitohs Apr 15 Cowboys 13.90
3 Titans vs. Dragons Apr 16 Titans 1.60
4 Bulldogs vs. Warriors Apr 16 Bulldogs 12.70
5 Broncos vs. Knights Apr 16 Broncos 20.70
6 Raiders vs. Sharks Apr 17 Raiders 0.10
7 Wests Tigers vs. Storm Apr 17 Storm -4.40
8 Roosters vs. Panthers Apr 18 Roosters 8.00

 

April 11, 2016

Missing data

Sometimes…often…practically always… when you get a data set there are missing values. You need to decide what to do with them. There’s a mathematical result that basically says there’s no reliable strategy, but different approaches may still be less completely useless in different settings.

One tempting but usually bad approach is to replace them with the average — it’s especially bad with geographical data.  We’ve seen fivethirtyeight.com get this badly wrong with kidnappings in Nigeria, we’ve seen maps of vaccine-preventable illness at epidemic proportions in the west Australian desert, we’ve seen Kansas misidentified as the porn centre of the United States.

The data problem that attributed porn to Kansas has more serious consequences. There’s a farm not far from Wichita that, according to the major database providing this information, has 600 million IP addresses.  Now think of the reasons why someone might need to look up the physical location of an internet address. Kashmir Hill, at Fusion, looks at the consequences, and at how a better “don’t know” address is being chosen.

Stat of the Week Competition: April 9 – 15 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 15 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 9 – 15 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: April 9 – 15 2016

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

April 9, 2016

Movie stars broken down by age and sex

The folks at Polygraph have a lovely set of interactive graphics of number of speaking lines in 2000 movie screenplays, with IMDB look-ups of actor age and gender.  If you haven’t been living in a cave on Mars, the basic conclusion won’t be surprising, but the extent of the differences might. Frozen, for example, gave more than half the lines to male characters.

They’ve also made a lot of data available on Github for other people to use. Here’s a graph combining the age and gender data in a different way than they did: total number of speaking lines by age and gender

hollywood

Men and women have similar number of speaking lines up to about age 30, but after that there’s a huge separation and much less opportunity for female actors.  We can all think of exceptions: Judi “M” Dench, Maggie “Minerva” Smith, Joanna “Absolutely no relation” Lumley, but they are exceptions.

Compared to what?

Two maps via Twitter:

From the Sydney Morning Herald, via @mlle_elle and @rpy

creativemap

The differences in population density swamp anything else. For the map to be useful we’d need a comparison between ‘creative professionals’ and ‘non-creative unprofessionals’.  There’s an XKCD about this.

Peter Ellis has another visualisation of the last election that emphasises comparisons. Here’s a comparison of Green and Labour votes (by polling place) across Auckland.

votemap

There’s a clear division between the areas where Labour and Green polled about the same, and those where Labour did much better

 

April 8, 2016

Briefly

  • A lottery in the US rigged by subverting the random number generator.  That’s harder to do with the complicated balls-from-a-machine we use — and it’s also more obvious when drawing balls from a machine that betting systems based on sophisticated numerical sequences won’t work.
  • The (US) Transport Security Administration has a ‘fast lane’ for more-trusted travellers, who get chosen for screening randomly. They use a randomizer app to make sure it really is random, which is a good idea — people are very bad at random choices. But perhaps it shouldn’t have cost $50k.
  • The Panama Papers are an example of the importance of data skills to journalists.
  • University of Otago research on microRNA may help with Alzheimer’s Disease diagnosis, which is interesting and potentially very useful, but there have been a lot of ‘potential tests’ recently. Also the research is unpublished and they aren’t disclosing yet which microRNAs are involved, so perhaps the publicity could have waited.
April 6, 2016

Super 18 Predictions for Round 7

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.97 9.84 -0.90
Highlanders 7.10 6.80 0.30
Chiefs 7.07 2.68 4.40
Hurricanes 5.64 7.26 -1.60
Brumbies 3.47 3.15 0.30
Waratahs 2.30 4.88 -2.60
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.74 -1.80 1.10
Bulls -2.00 -0.74 -1.30
Blues -4.72 -5.51 0.80
Rebels -5.31 -6.33 1.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.21 -9.27 0.10
Reds -10.01 -9.81 -0.20
Force -10.16 -8.43 -1.70
Sunwolves -12.43 -10.00 -2.40
Kings -16.08 -13.66 -2.40

 

Performance So Far

So far there have been 48 matches played, 31 of which were correctly predicted, a success rate of 64.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Force Apr 01 32 – 20 22.50 TRUE
2 Lions vs. Crusaders Apr 01 37 – 43 -5.70 TRUE
3 Blues vs. Jaguares Apr 02 24 – 16 8.00 TRUE
4 Brumbies vs. Chiefs Apr 02 23 – 48 3.90 FALSE
5 Kings vs. Sunwolves Apr 02 33 – 28 -0.30 FALSE
6 Bulls vs. Cheetahs Apr 02 23 – 18 11.50 TRUE
7 Waratahs vs. Rebels Apr 03 17 – 21 13.20 FALSE

 

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Chiefs vs. Blues Apr 08 Chiefs 15.30
2 Force vs. Crusaders Apr 08 Crusaders -15.10
3 Stormers vs. Sunwolves Apr 08 Stormers 17.90
4 Hurricanes vs. Jaguares Apr 09 Hurricanes 18.30
5 Reds vs. Highlanders Apr 09 Highlanders -13.10
6 Sharks vs. Lions Apr 09 Sharks 4.30
7 Kings vs. Bulls Apr 09 Bulls -10.60