June 8, 2016

Ben Goldacre interview at Public Address

Russell Brown interviews Ben Goldacre:

Have the media got any better or worse at science in the time you’ve been writing about these issues?

Ha! Well, I’m not aware of any longitudinal studies that would make a fair comparison over time to say if they’ve got better. But I think the incredibly refreshing thing is that they’ve become less relevant. Wen I started writing about this stuff 15 years ago, mainstream media were the only game in town. It’s incredible to think that 15 years ago, you couldn’t talk back. The internet was not like it is today.

June 7, 2016

Briefly

  • Alex Harrowell at The Yorkshire Ranter writes about two new papers combining large-scale data mining with sampling and human interpretation to study China’s ’50-cent party’ (五毛党) internet commentators
  • Y-axes, from the UK Office of National Statistics
June 6, 2016

Stat of the Week Competition: June 4 – 10 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday June 10 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of June 4 – 10 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

June 4, 2016

How to make predictive models good (and accurate)

Kareem Carr, guest-posting at Mathbabe.org

All three principles have one underlying idea. Bad data science obscures and ignores the real world performance of its algorithms. It relies on little to no validation. When it does perform validation, it relies on canned approaches to validation. It doesn’t critically examine instances of bad performance with an eye towards trying to understand how and why these failures occur. It doesn’t make the nature of these failures widely known so consumers of these algorithms can deploy them with discernment and sophistication.

June 3, 2016

Value-added?

From Stuff

Kiwi researchers have come up with a solution to the global obesity epidemic – a bitter plant extract that suppresses appetite.

As you’d expect, calling it “a solution” is completely over the top at the moment. They’ve done a placebo-controlled trial, but lasting less than one day, in only 20 men. The press release is more detailed and more restrained.

What made me mention this story, though, is the numbers. From Stuff

The researchers found that the Amarasate extract stimulated significant increases in hormones that regulate appetite and reduced food intake from 911 kJ (218 calories) to 944 kJ (226 calories).

That sounds incredibly unimpressive: an 8 calorie reduction. It’s wrong, or at least the press release is different and more plausible

.. both gastric and duodenal delivery of the Amarasate™ extract stimulated significant increases in the gut peptide hormones CCK, GLP-1 and PYY while significantly reducing total (lunch plus snack) ad libitum meal energy intake by 911 kJ (218 calories) and 944 kJ (226 calories), respectively.

They looked at two capsules to control where in the gut the stuff was released, and both types reduced calorie intake by a bit more than 200 calories, compared to placebo. The story was off by a factor of 25 or so.

 

 

[update: Those of you who read more carefully than either me or the journalist will have noticed that “reduced .. from 911 kJ .. to 944 kJ ” in the Stuff story is actually an increase, and even less excusable]

[Update next day: The numbers have been fixed —“reduced food intake by up to 944 kJ (226 calories).”  — but not the opening claim. ]

Briefly

  • David Cameron should ban hedge funds from trying to cash in on the EU referendum by commissioning private exit polls to speculate on sterling before the official result, Labour’s deputy leader has said.” (Guardian) But. “If you think that this is bad — and Watson probably isn’t alone in thinking that it’s bad — then it seems to me that you have to identify which part is bad. Is it asking someone how she voted? Is it asking lots of people how they voted? Is it making a prediction about the Brexit vote? Is it trading based on your prediction? Which specific thing would you make illegal?” (Matt Levine)
  • Generation Zero likes trains, and thinks other people also like trains. Rather than just asserting this or putting up a petition, they’re trying to crowdfund a real opinion poll to find out Auckland public opinion on maintaining a train option for the proposed harbour crossing. Obviously they’re doing this because they think they know what the answer will be, but it’s still a welcome step towards evidence-based lobbying.
  • Google’s ‘Digital Ethicist’ on how software design hijacks people’s minds — changing the (implied) question to affect people’s decisions.
June 2, 2016

Headline conclusions on slavery

I didn’t see this Stuff story at the time, but it was discussed on Twitter by Tess McClure (@tessairini).

The 2016 Global Slavery Index examines practices such as forced labour, human trafficking, child exploitation and forced marriage, surveying 43,000 people in 25 countries.

The number of people living in slavery in New Zealand has increased from 600 in the 2014 Global Slavery Index.

New Zealand and Australia have the lowest level of slavery prevalence in the Asia Pacific region with an estimated 0.018 per cent of the population in modern slavery.

If you survey 43,000 people in 25 countries you won’t be surveying very many in New Zealand, so where did this number come from?  The story doesn’t give any more details, but @tessairini found a ‘detailed methodology’ paper (PDF).

They didn’t survey any people in New Zealand. Or in Australia. Nor had they in 2014.

The survey part of the research is pretty much irrelevant to the estimates for New Zealand. The methodology paper describes another approach that

…can be applied in countries where nationally representative random sample surveys will not necessarily work. This is particularly the case in more ‘developed’ countries, where low levels of vulnerability mean that there are few cases to report, where law enforcement is strong and organized crime is more hidden, and where the resulting numbers are so small, that even if they were not hidden, they would be highly unlikely to be found and selected for interview in a random sample survey.

For the UK and the Netherlands the survey used data from the overlap of multiple lists. The UK estimate is described in Significance magazine, the popular-audience publication of the Royal Statistical Society. In all, 2744 victims of human trafficking were identified in the UK, from a total of six sources, so it’s possible to look at how many of these people were missed by each source, and estimate how many more might have been missed completely. The estimated total is between 10,000 and 13,000.

So, there’s survey data for 25 countries not including New Zealand or Australia, and multiple-list data for two further countries not including Australia and New Zealand.  We still haven’t found out where the New Zealand estimate comes from.

The final step is extrapolation from measured countries to unmeasured countries. The researchers measured a whole lot of variables that might be relevant, and divided the countries into groups that looked similar. They then applied the frequencies from the measured countries in each group, with a few adjustments, to the unmeasured countries.  If you look at the data in the Stuff story, Australia and New Zealand have the same estimated prevalence of slavery, 0.018% of the population. That’s also essentially the same as the UK estimate, so presumably we’re in the same group as the UK and that’s where the real data come from.

If you want a global estimate of the number of people affected by slavery, this is a perfectly reasonable approach. It’s probably kind of ok as an estimate for the number in New Zealand. On the other hand, the index data doesn’t support claims of change from year to year in New Zealand, and it doesn’t say anything about particular risks.

It makes sense to get local experts to talk about the industries and practices that might cause problems in New Zealand, as Stuff did, and what can be done about them, but the index estimate is just that New Zealand is about the same as the UK.

 

Ross Ihaka talks about a special virus: R

timthumb.phpHow did the statistical programming language R grow from a simple help-out for undergrad students to a global sensation? Associate Professor Ross Ihaka (right) of the University of Auckland tells the story in the latest issue of alumni magazine Ingenio.

And … here is some niceness from a fan who has read the story today. Thanks, Mike!Capture

June 1, 2016

Super 18 Predictions for Round 15

I have updated these predictions because I had not realised that the Chiefs Crusaders game was on a neutral ground.

Team Ratings for Round 15

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 9.81 9.84 -0.00
Highlanders 7.08 6.80 0.30
Hurricanes 6.99 7.26 -0.30
Brumbies 5.22 3.15 2.10
Waratahs 5.13 4.88 0.30
Lions 4.88 -1.80 6.70
Chiefs 4.55 2.68 1.90
Sharks 2.86 -1.64 4.50
Stormers 0.33 -0.62 1.00
Bulls -3.03 -0.74 -2.30
Blues -4.91 -5.51 0.60
Rebels -6.09 -6.33 0.20
Cheetahs -7.01 -9.27 2.30
Jaguares -9.08 -10.00 0.90
Reds -9.34 -9.81 0.50
Force -10.81 -8.43 -2.40
Sunwolves -18.62 -10.00 -8.60
Kings -21.19 -13.66 -7.50

 

Performance So Far

So far there have been 108 matches played, 79 of which were correctly predicted, a success rate of 73.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Hurricanes vs. Highlanders May 27 27 – 20 2.90 TRUE
2 Waratahs vs. Chiefs May 27 45 – 25 2.50 TRUE
3 Kings vs. Jaguares May 27 29 – 22 -10.20 FALSE
4 Blues vs. Crusaders May 28 21 – 26 -12.10 TRUE
5 Brumbies vs. Sunwolves May 28 66 – 5 23.30 TRUE
6 Stormers vs. Cheetahs May 28 31 – 24 11.40 TRUE
7 Bulls vs. Lions May 28 20 – 56 -0.10 TRUE
8 Rebels vs. Force May 29 27 – 22 8.70 TRUE

 

Predictions for Round 15

Here are the predictions for Round 15. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Chiefs vs. Crusaders Jul 01 Crusaders -5.30
2 Brumbies vs. Reds Jul 01 Brumbies 18.10
3 Sunwolves vs. Waratahs Jul 02 Waratahs -19.70
4 Hurricanes vs. Blues Jul 02 Hurricanes 15.40
5 Rebels vs. Stormers Jul 02 Stormers -2.40
6 Cheetahs vs. Force Jul 02 Cheetahs 7.80
7 Kings vs. Highlanders Jul 02 Highlanders -24.30
8 Lions vs. Sharks Jul 02 Lions 5.50
9 Jaguares vs. Bulls Jul 02 Bulls -2.10

 

NRL Predictions for Round 13

Team Ratings for Round 13

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 10.74 10.29 0.40
Broncos 10.55 9.81 0.70
Storm 7.33 4.41 2.90
Sharks 6.19 -1.06 7.20
Bulldogs 2.20 1.50 0.70
Raiders 1.89 -0.55 2.40
Roosters 1.29 11.20 -9.90
Rabbitohs -0.34 -1.20 0.90
Panthers -0.43 -3.06 2.60
Sea Eagles -0.48 0.36 -0.80
Eels -0.56 -4.62 4.10
Dragons -2.63 -0.10 -2.50
Titans -4.16 -8.39 4.20
Wests Tigers -7.23 -4.06 -3.20
Warriors -7.72 -7.47 -0.30
Knights -14.97 -5.41 -9.60

 

Performance So Far

So far there have been 92 matches played, 53 of which were correctly predicted, a success rate of 57.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Wests Tigers May 27 18 – 19 24.20 FALSE
2 Dragons vs. Cowboys May 28 14 – 10 -12.70 FALSE
3 Raiders vs. Bulldogs May 29 32 – 20 1.10 TRUE
4 Knights vs. Eels May 30 18 – 20 -13.00 TRUE

 

Predictions for Round 13

Here are the predictions for Round 13. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Raiders vs. Sea Eagles Jun 03 Raiders 5.40
2 Warriors vs. Broncos Jun 04 Broncos -14.30
3 Cowboys vs. Knights Jun 04 Cowboys 28.70
4 Storm vs. Panthers Jun 04 Storm 10.80
5 Roosters vs. Wests Tigers Jun 05 Roosters 11.50
6 Rabbitohs vs. Titans Jun 05 Rabbitohs 3.80
7 Bulldogs vs. Sharks Jun 06 Sharks -1.00