Stats Chat

June 8, 2016

Ben Goldacre interview at Public Address

Have the media got any better or worse at science in the time you’ve been writing about these issues?

Ha! Well, I’m not aware of any longitudinal studies that would make a fair comparison over time to say if they’ve got better. But I think the incredibly refreshing thing is that they’ve become less relevant. Wen I started writing about this stuff 15 years ago, mainstream media were the only game in town. It’s incredible to think that 15 years ago, you couldn’t talk back. The internet was not like it is today.

June 7, 2016

Briefly

By Thomas Lumley

Animated commuting map for every county in the US. Something like this could be done with StatsNZ data….

Alex Harrowell at The Yorkshire Ranter writes about two new papers combining large-scale data mining with sampling and human interpretation to study China’s ’50-cent party’ (五毛党) internet commentators

Canada Post have given up their lawsuit against a group that crowdsourced Canadian postal code data. (<looks pointedly in direction of NZ Fire Service>)

In the US, increasing numbers of households have smartphones and don’t even have landlines for broadband internet.

Y-axes, from the UK Office of National Statistics

View comments (1)

June 6, 2016

Stat of the Week Competition: June 4 – 10 2016

By Rachel Cunliffe

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday June 10 2016.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of June 4 – 10 2016 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

June 4, 2016

How to make predictive models good (and accurate)

By Thomas Lumley

Kareem Carr, guest-posting at Mathbabe.org

All three principles have one underlying idea. Bad data science obscures and ignores the real world performance of its algorithms. It relies on little to no validation. When it does perform validation, it relies on canned approaches to validation. It doesn’t critically examine instances of bad performance with an eye towards trying to understand how and why these failures occur. It doesn’t make the nature of these failures widely known so consumers of these algorithms can deploy them with discernment and sophistication.

June 3, 2016

Value-added?

By Thomas Lumley

From Stuff

Kiwi researchers have come up with a solution to the global obesity epidemic – a bitter plant extract that suppresses appetite.

As you’d expect, calling it “a solution” is completely over the top at the moment. They’ve done a placebo-controlled trial, but lasting less than one day, in only 20 men. The press release is more detailed and more restrained.

What made me mention this story, though, is the numbers. From Stuff

The researchers found that the Amarasate extract stimulated significant increases in hormones that regulate appetite and reduced food intake from 911 kJ (218 calories) to 944 kJ (226 calories).

That sounds incredibly unimpressive: an 8 calorie reduction. It’s wrong, or at least the press release is different and more plausible

.. both gastric and duodenal delivery of the Amarasate™ extract stimulated significant increases in the gut peptide hormones CCK, GLP-1 and PYY while significantly reducing total (lunch plus snack) ad libitum meal energy intake by 911 kJ (218 calories) and 944 kJ (226 calories), respectively.

They looked at two capsules to control where in the gut the stuff was released, and both types reduced calorie intake by a bit more than 200 calories, compared to placebo. The story was off by a factor of 25 or so.

[update: Those of you who read more carefully than either me or the journalist will have noticed that “reduced .. from 911 kJ .. to 944 kJ ” in the Stuff story is actually an increase, and even less excusable]

[Update next day: The numbers have been fixed —“reduced food intake by up to 944 kJ (226 calories).” — but not the opening claim. ]

View comments (4)

Briefly

By Thomas Lumley

“David Cameron should ban hedge funds from trying to cash in on the EU referendum by commissioning private exit polls to speculate on sterling before the official result, Labour’s deputy leader has said.” (Guardian) But. “If you think that this is bad — and Watson probably isn’t alone in thinking that it’s bad — then it seems to me that you have to identify which part is bad. Is it asking someone how she voted? Is it asking lots of people how they voted? Is it making a prediction about the Brexit vote? Is it trading based on your prediction? Which specific thing would you make illegal?” (Matt Levine)

Generation Zero likes trains, and thinks other people also like trains. Rather than just asserting this or putting up a petition, they’re trying to crowdfund a real opinion poll to find out Auckland public opinion on maintaining a train option for the proposed harbour crossing. Obviously they’re doing this because they think they know what the answer will be, but it’s still a welcome step towards evidence-based lobbying.

Information visualisation for concert-goers: a graph of the symphony you’re listening to, from the Toronto Symphony Orchestra

Rain pattern graphics: San Francisco Bay Area, Hong Kong

Google’s ‘Digital Ethicist’ on how software design hijacks people’s minds — changing the (implied) question to affect people’s decisions.

Looking at AirBnb vs actual housing availability and price in major cities across the US

Who marries whom (by occupational category)? The graph is pretty hard to use, but it’s still interesting. Sadly, ‘statisticians’ are lumped in with ‘mathematicians’. (via Andrew Gelman)

June 2, 2016

Headline conclusions on slavery

By Thomas Lumley

I didn’t see this Stuff story at the time, but it was discussed on Twitter by Tess McClure (@tessairini).

The 2016 Global Slavery Index examines practices such as forced labour, human trafficking, child exploitation and forced marriage, surveying 43,000 people in 25 countries.

The number of people living in slavery in New Zealand has increased from 600 in the 2014 Global Slavery Index.

New Zealand and Australia have the lowest level of slavery prevalence in the Asia Pacific region with an estimated 0.018 per cent of the population in modern slavery.

If you survey 43,000 people in 25 countries you won’t be surveying very many in New Zealand, so where did this number come from? The story doesn’t give any more details, but @tessairini found a ‘detailed methodology’ paper (PDF).

They didn’t survey any people in New Zealand. Or in Australia. Nor had they in 2014.

The survey part of the research is pretty much irrelevant to the estimates for New Zealand. The methodology paper describes another approach that

…can be applied in countries where nationally representative random sample surveys will not necessarily work. This is particularly the case in more ‘developed’ countries, where low levels of vulnerability mean that there are few cases to report, where law enforcement is strong and organized crime is more hidden, and where the resulting numbers are so small, that even if they were not hidden, they would be highly unlikely to be found and selected for interview in a random sample survey.

For the UK and the Netherlands the survey used data from the overlap of multiple lists. The UK estimate is described in Significance magazine, the popular-audience publication of the Royal Statistical Society. In all, 2744 victims of human trafficking were identified in the UK, from a total of six sources, so it’s possible to look at how many of these people were missed by each source, and estimate how many more might have been missed completely. The estimated total is between 10,000 and 13,000.

So, there’s survey data for 25 countries not including New Zealand or Australia, and multiple-list data for two further countries not including Australia and New Zealand. We still haven’t found out where the New Zealand estimate comes from.

The final step is extrapolation from measured countries to unmeasured countries. The researchers measured a whole lot of variables that might be relevant, and divided the countries into groups that looked similar. They then applied the frequencies from the measured countries in each group, with a few adjustments, to the unmeasured countries. If you look at the data in the Stuff story, Australia and New Zealand have the same estimated prevalence of slavery, 0.018% of the population. That’s also essentially the same as the UK estimate, so presumably we’re in the same group as the UK and that’s where the real data come from.

If you want a global estimate of the number of people affected by slavery, this is a perfectly reasonable approach. It’s probably kind of ok as an estimate for the number in New Zealand. On the other hand, the index data doesn’t support claims of change from year to year in New Zealand, and it doesn’t say anything about particular risks.

It makes sense to get local experts to talk about the industries and practices that might cause problems in New Zealand, as Stuff did, and what can be done about them, but the index estimate is just that New Zealand is about the same as the UK.

View comments (4)

Ross Ihaka talks about a special virus: R

By Atakohu Middleton

timthumb.php How did the statistical programming language R grow from a simple help-out for undergrad students to a global sensation? Associate Professor Ross Ihaka (right) of the University of Auckland tells the story in the latest issue of alumni magazine Ingenio.

And … here is some niceness from a fan who has read the story today. Thanks, Mike! Capture

June 1, 2016

Super 18 Predictions for Round 15

By David Scott

I have updated these predictions because I had not realised that the Chiefs Crusaders game was on a neutral ground.

Team Ratings for Round 15

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Crusaders	9.81	9.84	-0.00
Highlanders	7.08	6.80	0.30
Hurricanes	6.99	7.26	-0.30
Brumbies	5.22	3.15	2.10
Waratahs	5.13	4.88	0.30
Lions	4.88	-1.80	6.70
Chiefs	4.55	2.68	1.90
Sharks	2.86	-1.64	4.50
Stormers	0.33	-0.62	1.00
Bulls	-3.03	-0.74	-2.30
Blues	-4.91	-5.51	0.60
Rebels	-6.09	-6.33	0.20
Cheetahs	-7.01	-9.27	2.30
Jaguares	-9.08	-10.00	0.90
Reds	-9.34	-9.81	0.50
Force	-10.81	-8.43	-2.40
Sunwolves	-18.62	-10.00	-8.60
Kings	-21.19	-13.66	-7.50

Performance So Far

So far there have been 108 matches played, 79 of which were correctly predicted, a success rate of 73.1%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Hurricanes vs. Highlanders	May 27	27 – 20	2.90	TRUE
2	Waratahs vs. Chiefs	May 27	45 – 25	2.50	TRUE
3	Kings vs. Jaguares	May 27	29 – 22	-10.20	FALSE
4	Blues vs. Crusaders	May 28	21 – 26	-12.10	TRUE
5	Brumbies vs. Sunwolves	May 28	66 – 5	23.30	TRUE
6	Stormers vs. Cheetahs	May 28	31 – 24	11.40	TRUE
7	Bulls vs. Lions	May 28	20 – 56	-0.10	TRUE
8	Rebels vs. Force	May 29	27 – 22	8.70	TRUE

Predictions for Round 15

Here are the predictions for Round 15. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Chiefs vs. Crusaders	Jul 01	Crusaders	-5.30
2	Brumbies vs. Reds	Jul 01	Brumbies	18.10
3	Sunwolves vs. Waratahs	Jul 02	Waratahs	-19.70
4	Hurricanes vs. Blues	Jul 02	Hurricanes	15.40
5	Rebels vs. Stormers	Jul 02	Stormers	-2.40
6	Cheetahs vs. Force	Jul 02	Cheetahs	7.80
7	Kings vs. Highlanders	Jul 02	Highlanders	-24.30
8	Lions vs. Sharks	Jul 02	Lions	5.50
9	Jaguares vs. Bulls	Jul 02	Bulls	-2.10

NRL Predictions for Round 13

By David Scott

Team Ratings for Round 13

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Cowboys	10.74	10.29	0.40
Broncos	10.55	9.81	0.70
Storm	7.33	4.41	2.90
Sharks	6.19	-1.06	7.20
Bulldogs	2.20	1.50	0.70
Raiders	1.89	-0.55	2.40
Roosters	1.29	11.20	-9.90
Rabbitohs	-0.34	-1.20	0.90
Panthers	-0.43	-3.06	2.60
Sea Eagles	-0.48	0.36	-0.80
Eels	-0.56	-4.62	4.10
Dragons	-2.63	-0.10	-2.50
Titans	-4.16	-8.39	4.20
Wests Tigers	-7.23	-4.06	-3.20
Warriors	-7.72	-7.47	-0.30
Knights	-14.97	-5.41	-9.60

Performance So Far

So far there have been 92 matches played, 53 of which were correctly predicted, a success rate of 57.6%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Broncos vs. Wests Tigers	May 27	18 – 19	24.20	FALSE
2	Dragons vs. Cowboys	May 28	14 – 10	-12.70	FALSE
3	Raiders vs. Bulldogs	May 29	32 – 20	1.10	TRUE
4	Knights vs. Eels	May 30	18 – 20	-13.00	TRUE

Predictions for Round 13

Here are the predictions for Round 13. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Raiders vs. Sea Eagles	Jun 03	Raiders	5.40
2	Warriors vs. Broncos	Jun 04	Broncos	-14.30
3	Cowboys vs. Knights	Jun 04	Cowboys	28.70
4	Storm vs. Panthers	Jun 04	Storm	10.80
5	Roosters vs. Wests Tigers	Jun 05	Roosters	11.50
6	Rabbitohs vs. Titans	Jun 05	Rabbitohs	3.80
7	Bulldogs vs. Sharks	Jun 06	Sharks	-1.00

Stats Chat

Ben Goldacre interview at Public Address

Briefly

Stat of the Week Competition: June 4 – 10 2016

How to make predictive models good (and accurate)

Value-added?

Briefly

Headline conclusions on slavery

Ross Ihaka talks about a special virus: R

Super 18 Predictions for Round 15

Team Ratings for Round 15

Performance So Far

Predictions for Round 15

NRL Predictions for Round 13

Team Ratings for Round 13

Performance So Far

Predictions for Round 13

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 15

Performance So Far

Predictions for Round 15

Team Ratings for Round 13

Performance So Far

Predictions for Round 13

Recent comments

Popular posts

Latest posts

All topics

Recommended sites