Posts from May 2014 (77)

May 19, 2014

Stat of the Week Competition: May 17 – 23 2014

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday May 23 2014.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of May 17 – 23 2014 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: May 17 – 23 2014

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

May 17, 2014

Robustly null

 

A new study pooled data from previous studies of vaccination and autism, and as Emily Willingham writes, it gives what you’d expect.

Five cohort studies involving 1,256,407 children and five case-control studies involving 9920 children were included in this analysis.

  • There was no relationship between vaccination and autism (OR: 0.99; 95% CI: 0.92 to 1.06).
  • There was no relationship between vaccination and [autism spectrum disorder] (OR: 0.91; 95% CI: 0.68 to 1.20).
  • There was no relationship between  [autism spectrum disorder] and MMR (OR: 0.84; 95% CI: 0.70 to 1.01).
  • There was no relationship between  [autism spectrum disorder] and thimerosal (OR: 1.00; 95% CI: 0.77 to 1.31).
  • There was no relationship between  [autism spectrum disorder] and mercury (Hg) (OR: 1.00; 95% CI: 0.93 to 1.07).

Findings of this meta-analysis suggest that vaccinations are not associated with the development of autism or autism spectrum disorder.

These results basically rule out any substantial effect due to vaccination. To the extent that they suggest any effect, it is protective, but that’s probably just chance.

This sort of result is pretty boring, so it’s unlikely to get anywhere near the same media coverage as the claims that there is an effect.

 

May 16, 2014

On keeping your own score…

“People and institutions cannot keep their own score accurately. Metrics soon become targets and then pitches, and are thereby gamed, corrupted, misreported, fudged…

Examples: premature revenue recognition, Libor rates, beating the quarterly forecast by a single penny, terrorist attacks prevented, Weapons of Mass Destruction, number of Twitter followers, all body counts (crowd sizes, civilians blown up). Sometimes call the Principle of Lake Woebegone, where all children are above average.”

– Edward Tufte

Averages, percentages, nets, and GST

Our only Prime Minister, on Radio NZ

“I utterly reject those propositions. Twelve percent of households pay 76 percent of all net tax in New Zealand,” he said.

I’ve written about “net tax” before, both on StatsChat and elsewhere. It has to be defined and analysed carefully and non-intuitively in order to get these sorts of results.

Suppose we had an imaginary population divided into three groups. The ‘Low’ group, of 10 people, each pay $1000 in income tax, $1000 in GST, and receive $1500 in cash benefits. The “Middle” group, of 5 3 people, each pays $4000 in income tax, $3000 in GST, and receives no cash in benefits. The one person in the “High” group pays $17000 in income tax, $8000 in GST, and receives no cash in benefits.

According to Mr Key’s definition, the high-income group pays 71% of the “net tax”.  The middle-income group pays 50% of the “net tax”, and the low-income group pays -21% of the “net tax”.  That’s even though every person in this imaginary population pays more in tax than they receive in cash benefits.

There are three strange things going on here. The first is that GST is ignored. That’s obviously just wrong — GST is just as real as income tax.  The second is that cash benefits are treated differently from all other categories of government expenditure, even other categories such as subsidised medications that provide a direct, quantifiable individual benefit.  The third is that percentages behave strangely when you have a mixture of negative and positive numbers.  It’s quite possible, by choosing the subsets of the NZ population correctly, to find a group that pays well over 100% of the “net tax”.

Percentages become a lot less useful when they aren’t bounded by 100, and people who want to communicate accurately should avoid them in that situation.  And if you want to distinguish income tax revenue from GST revenue, you should clearly explain what you’re doing and why.

Smarter than the average bear

Online polling company YouGov asked people in the US and Britain about how their intelligence compared to other people.

For the US, the results were

usintel

 

They pulled that graph only seconds after I found it, and replaced it with the more plausible

intelligence2

The British appear to be slightly more reluctant that the Americans to say they’re smarter than average, though it would be unwise to assume they are less likely to believe it.

 

selfassess1-2

May 15, 2014

Budget visualisation

Keith Ng has his annual interactive graphic of budget changes up at Public Address, and will soon have a graphic showing how overall forecasts have changed over time.

[update] And Harkanwal Singh has his version up at the Herald

Takes two to tango

There’s a Stat-of-the-Week nomination for a Dominion Post article that I haven’t seen, because Stuff has had the good sense not to put it online. The press release is on Scoop, and from what our correspondent says, if you’ve read that, you’ve read the story. It’s about sex at the office, based on a ridiculously small sample selected from members of a dating website.  Since the dating website in question makes a lot of how different its members are from typical people, representativeness is not likely. Also, their infographic disagrees with the text of the release in at least one place.

That’s all standard. What’s interesting is the comparison of proportion of men and women who have had sex in various situations. Now, for the heterosexual majority, we have a basic accounting constraint in play. The office-sex survey says 20% of men and 3% of women have got it on in a conference room and 15% of men and 2% of women have done so in a storage room.   If these numbers were true there would be only three explanations: there are a lot more gay men around than other data suggest, and they really like the office; the few women who have sex at the office do so with many different men; or we have a Clintonesque definitional problem where the vast majority of the women involved don’t think what they did was sex.  More likely, it’s just evidence that the numbers are meaningless.

We’ve seen this problem before, but at least this is one problem the Herald’s story about holiday romance based on an Expedia press release avoided.

May 14, 2014

One of the things social media is good for

[Update: 538 now has an intro to the story explaining the mistakes and apologising. Good for them.]

So, at  fivethirtyeight.com there’s this story on mapping kidnappings in Nigeria using data from GDELT, the sort of thing data journalism is supposed to be good at. GDELT automatically extracts information from news stories to build a huge global database.

On Twitter, Erin Simpson, whose about.me page says she is “a leading specialist in the intersection of intelligence, data analysis, irregular warfare, and illicit systems – with an emphasis on novel research designs,” — and who has worked on the GDELT parser — is Not Happy.

Thanks to Storify, here are three summaries of what she says, but a lot of it can be boiled down to one point:

In conclusion: VALIDATE YOUR FREAKING DATA. It’s not true just because it’s on a goddamn map.

(via @LewSOS)

May 13, 2014

NRL Predictions for Round 10

Team Ratings for Round 10

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 9.26 12.35 -3.10
Bulldogs 8.21 2.46 5.70
Rabbitohs 7.98 5.82 2.20
Sea Eagles 6.93 9.10 -2.20
Cowboys 4.73 6.01 -1.30
Storm 0.74 7.64 -6.90
Broncos 0.39 -4.69 5.10
Panthers -0.21 -2.48 2.30
Titans -0.28 1.45 -1.70
Warriors -0.61 -0.72 0.10
Knights -1.93 5.23 -7.20
Sharks -4.72 2.32 -7.00
Wests Tigers -5.91 -11.26 5.40
Dragons -8.08 -7.57 -0.50
Eels -8.63 -18.45 9.80
Raiders -9.66 -8.99 -0.70

 

Performance So Far

So far there have been 72 matches played, 39 of which were correctly predicted, a success rate of 54.2%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Roosters vs. Wests Tigers May 09 30 – 6 18.60 TRUE
2 Cowboys vs. Broncos May 09 27 – 14 7.80 TRUE
3 Warriors vs. Raiders May 10 54 – 12 7.80 TRUE
4 Titans vs. Rabbitohs May 10 18 – 40 0.10 FALSE
5 Storm vs. Sea Eagles May 10 22 – 19 -2.90 FALSE
6 Knights vs. Panthers May 11 10 – 32 7.90 FALSE
7 Dragons vs. Bulldogs May 11 6 – 38 -7.50 TRUE
8 Eels vs. Sharks May 12 42 – 24 -3.20 FALSE

 

Predictions for Round 10

Here are the predictions for Round 10. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Rabbitohs vs. Storm May 16 Rabbitohs 11.70
2 Broncos vs. Titans May 16 Broncos 5.20
3 Eels vs. Dragons May 17 Eels 4.00
4 Sharks vs. Wests Tigers May 17 Sharks 5.70
5 Cowboys vs. Roosters May 17 Roosters -0.00
6 Raiders vs. Panthers May 18 Panthers -4.90
7 Bulldogs vs. Warriors May 18 Bulldogs 13.30
8 Sea Eagles vs. Knights May 19 Sea Eagles 13.40