Stats Chat

April 9, 2016

Compared to what?

Two maps via Twitter:

From the Sydney Morning Herald, via @mlle_elle and @rpy

The differences in population density swamp anything else. For the map to be useful we’d need a comparison between ‘creative professionals’ and ‘non-creative unprofessionals’. There’s an XKCD about this.

Peter Ellis has another visualisation of the last election that emphasises comparisons. Here’s a comparison of Green and Labour votes (by polling place) across Auckland.

There’s a clear division between the areas where Labour and Green polled about the same, and those where Labour did much better

View comments (6)

April 8, 2016

Briefly

By Thomas Lumley

A lottery in the US rigged by subverting the random number generator. That’s harder to do with the complicated balls-from-a-machine we use — and it’s also more obvious when drawing balls from a machine that betting systems based on sophisticated numerical sequences won’t work.

The (US) Transport Security Administration has a ‘fast lane’ for more-trusted travellers, who get chosen for screening randomly. They use a randomizer app to make sure it really is random, which is a good idea — people are very bad at random choices. But perhaps it shouldn’t have cost $50k.

The Panama Papers are an example of the importance of data skills to journalists.

Visualising pedestrian traffic in Melbourne: high-school students working with Di Cook and some of her students at Monash University

What happens when newsrooms get live statistics on the popularity of articles. A depressed view from Peter Preston and a more positive take from Chris Moran

Australians increasingly preferring dogs with shorter snouts and wider heads

University of Otago research on microRNA may help with Alzheimer’s Disease diagnosis, which is interesting and potentially very useful, but there have been a lot of ‘potential tests’ recently. Also the research is unpublished and they aren’t disclosing yet which microRNAs are involved, so perhaps the publicity could have waited.

April 6, 2016

Super 18 Predictions for Round 7

By David Scott

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Crusaders	8.97	9.84	-0.90
Highlanders	7.10	6.80	0.30
Chiefs	7.07	2.68	4.40
Hurricanes	5.64	7.26	-1.60
Brumbies	3.47	3.15	0.30
Waratahs	2.30	4.88	-2.60
Stormers	1.48	-0.62	2.10
Sharks	0.10	-1.64	1.70
Lions	-0.74	-1.80	1.10
Bulls	-2.00	-0.74	-1.30
Blues	-4.72	-5.51	0.80
Rebels	-5.31	-6.33	1.00
Jaguares	-8.69	-10.00	1.30
Cheetahs	-9.21	-9.27	0.10
Reds	-10.01	-9.81	-0.20
Force	-10.16	-8.43	-1.70
Sunwolves	-12.43	-10.00	-2.40
Kings	-16.08	-13.66	-2.40

Performance So Far

So far there have been 48 matches played, 31 of which were correctly predicted, a success rate of 64.6%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Highlanders vs. Force	Apr 01	32 – 20	22.50	TRUE
2	Lions vs. Crusaders	Apr 01	37 – 43	-5.70	TRUE
3	Blues vs. Jaguares	Apr 02	24 – 16	8.00	TRUE
4	Brumbies vs. Chiefs	Apr 02	23 – 48	3.90	FALSE
5	Kings vs. Sunwolves	Apr 02	33 – 28	-0.30	FALSE
6	Bulls vs. Cheetahs	Apr 02	23 – 18	11.50	TRUE
7	Waratahs vs. Rebels	Apr 03	17 – 21	13.20	FALSE

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Chiefs vs. Blues	Apr 08	Chiefs	15.30
2	Force vs. Crusaders	Apr 08	Crusaders	-15.10
3	Stormers vs. Sunwolves	Apr 08	Stormers	17.90
4	Hurricanes vs. Jaguares	Apr 09	Hurricanes	18.30
5	Reds vs. Highlanders	Apr 09	Highlanders	-13.10
6	Sharks vs. Lions	Apr 09	Sharks	4.30
7	Kings vs. Bulls	Apr 09	Bulls	-10.60

NRL Predictions for Round 6

By David Scott

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Cowboys	12.42	10.29	2.10
Broncos	8.47	9.81	-1.30
Roosters	3.11	11.20	-8.10
Storm	2.78	4.41	-1.60
Bulldogs	2.25	1.50	0.80
Rabbitohs	1.67	-1.20	2.90
Sharks	1.59	-1.06	2.60
Raiders	0.23	-0.55	0.80
Sea Eagles	-1.06	0.36	-1.40
Panthers	-1.74	-3.06	1.30
Eels	-1.87	-4.62	2.80
Dragons	-2.67	-0.10	-2.60
Warriors	-4.59	-7.47	2.90
Wests Tigers	-5.05	-4.06	-1.00
Titans	-5.32	-8.39	3.10
Knights	-8.56	-5.41	-3.10

Performance So Far

So far there have been 40 matches played, 21 of which were correctly predicted, a success rate of 52.5%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Sea Eagles vs. Rabbitohs	Mar 31	12 – 16	1.10	FALSE
2	Titans vs. Broncos	Apr 01	16 – 24	-11.30	TRUE
3	Storm vs. Knights	Apr 02	18 – 14	16.00	TRUE
4	Wests Tigers vs. Sharks	Apr 02	26 – 34	-2.80	TRUE
5	Cowboys vs. Dragons	Apr 02	36 – 0	15.20	TRUE
6	Roosters vs. Warriors	Apr 03	28 – 32	14.20	FALSE
7	Eels vs. Panthers	Apr 03	18 – 20	3.80	FALSE
8	Bulldogs vs. Raiders	Apr 04	8 – 22	8.00	FALSE

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Broncos vs. Dragons	Apr 07	Broncos	14.10
2	Rabbitohs vs. Roosters	Apr 08	Rabbitohs	1.60
3	Eels vs. Raiders	Apr 09	Eels	0.90
4	Warriors vs. Sea Eagles	Apr 09	Warriors	0.50
5	Panthers vs. Cowboys	Apr 09	Cowboys	-11.20
6	Sharks vs. Titans	Apr 10	Sharks	9.90
7	Knights vs. Wests Tigers	Apr 10	Wests Tigers	-0.50
8	Storm vs. Bulldogs	Apr 11	Storm	3.50

April 4, 2016

Stat of the Week Competition: April 2 – 8 2016

By Rachel Cunliffe

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 8 2016.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of April 2 – 8 2016 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

April 2, 2016

One weird trick increases donating tenfold?

By Thomas Lumley

From the Herald:

US researchers have confirmed a strange link between touching rough surfaces and feeling for others, which could help charities raise more money.

Based on my usual complaints about this sort of claim, you might expect that the research didn’t look at donating money or that it saw only a tiny difference. No.

There were five experiments, but only one that involved actual money. People were approached on the street and given a description of a health-related charity, and asked to donate. One charity was real, working in a familiar disease; the other was fake, working in a real but obscure disease (all the money actually ended up with the real charity). Half the participants were given the information and donation envelope on a clipboard with rough sandpaper on the back; the other half weren’t.

When asked to donate to the National Breast Cancer Foundation there was no difference between the rough and smooth clipboards (as you’d expect). When asked to donate to the National Sjögren’s Foundation, 10/34 with sandpaper-backed clipboards said yes compared to only 1/32 with smooth clipboards.

I’m going to go very slightly out on a limb here to say there is no way this ten-fold increase is a real and generalisable phenomenon. So, what went wrong? Part of the problem is what Andrew Gelman calls the ‘garden of forking paths’, after the Jorge Luis Borges story — there are many, many possible analyses and they don’t all show this dramatic difference.

For example, there wasn’t a difference in donation probability with the familiar charity. This was consistent with the researchers’ theory, but I’m pretty sure if there had been a difference the researchers wouldn’t have considered it as evidence refuting the theory. Also, the researchers note that they didn’t see a difference in donation amount with the sandpaper, just in donation probability.

Also, if you assume the ten-fold increase was overestimated even a bit, you then get into the problem of sample size. Suppose that the effect was only a two-fold increase rather than ten-fold. That still seems implausibly large to me, but the comparison would then be something like 2/34 vs 1/32 and would be completely unimpressive. You’d need a sample size something like ten times larger. And that’s if a bit of sandpaper on the back of a clipboard doubled the number of people who donated.

Still, these findings could have “significant implications for less well-known charities”, as the researchers suggest. If I got approached by a charity using sandpaper on the back of their clipboards, I would tend to think they were (a) poor at evaluating evidence, and (b) not all that honest. I could see that having an impact.

View comments (1)

March 30, 2016

Hold the lettuce

By Thomas Lumley

Q: Did you see vegetarian diets cause cancer now?

A: No.

Q: The Herald site front page: headline Vegetarianism can lead to cancer?

A: No

Q: The teaser: “Scientists have found there can be long-term health risks associated with a vegetarian diet, that could outweigh the benefits.”?

A: Well, it depends on what you mean by ‘long-term’, for a start.

Q: How long-term?

A: Centuries, perhaps thousands of years.

Q: How did they find people who were thousands of years old? And why isn’t that the headline?

A: Not people.

Q: I refuse to believe in century-old lab mice.

A: Human populations.

Q: Ok, so if we click through to the story (from the Telegraph) it seems they’re saying your great-grandparents eating lettuce gives you harmful mutations?

A: That’s what the story says, but it’s not what the research says. The research suggests that a mutation that with a modern diet might increase cancer risk arose randomly a long time in the past and became common in a South Asian population where vegetarian diets have been common.

Q: How did the mutation become common?

A: Because it wasn’t true that the long-term health risks outweighed the benefits — there’s genetic evidence of ‘selection’ in the evolutionary sense, meaning that people with the mutation had more descendants on average.

Q: How much health risk did they find?

A: They weren’t looking at health risks

Q: But “long-term health risks” and “can lead to cancer”?

A: Sadly, yes.

Q: Ok, what were they looking at?

A: They were looking at enzymes that turns one type of fatty acid into another. The mutation makes it easier for the body to synthesis long polyunsaturated acids

Q: Aren’t they good?

A: Some of them, like the DHA and EPA also found in fish, are thought to reduce inflammation and heart disease. But arachidonic acid is thought to increase inflammation, though the American Heart Association isn’t convinced

Q: That’s heart disease. What about cancer?

A: The only links to cancer are pretty speculative — that the mutation could reinforce effects of modern diet in increasing cancer risk. The contribution of arachidonic acid to that is controversial. But it could be real.

Q: Is there actually a higher cancer rate where they got their vegetarian population from, compared to the control population?

A: No.

Q: That ‘arachidonic acid’ thing. Why does that make me think of spiders?

A: Yes, me too. It’s a false cognate: Latin ‘arachis‘, ‘peanut’, not the mythic Greek technologist Aράχνη that arachnids were named for.

View comments (1)

Super 18 Predictions for Round 6

By David Scott

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Crusaders	8.95	9.84	-0.90
Highlanders	7.73	6.80	0.90
Hurricanes	5.64	7.26	-1.60
Chiefs	5.34	2.68	2.70
Brumbies	5.21	3.15	2.10
Waratahs	3.34	4.88	-1.50
Stormers	1.48	-0.62	2.10
Sharks	0.10	-1.64	1.70
Lions	-0.72	-1.80	1.10
Bulls	-1.61	-0.74	-0.90
Blues	-4.72	-5.51	0.80
Rebels	-6.35	-6.33	-0.00
Jaguares	-8.69	-10.00	1.30
Cheetahs	-9.60	-9.27	-0.30
Reds	-10.01	-9.81	-0.20
Force	-10.79	-8.43	-2.40
Sunwolves	-12.12	-10.00	-2.10
Kings	-16.40	-13.66	-2.70

Performance So Far

So far there have been 41 matches played, 27 of which were correctly predicted, a success rate of 65.9%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Hurricanes vs. Kings	Mar 25	42 – 20	26.60	TRUE
2	Chiefs vs. Force	Mar 26	53 – 10	17.00	TRUE
3	Rebels vs. Highlanders	Mar 26	3 – 27	-8.20	TRUE
4	Sunwolves vs. Bulls	Mar 26	27 – 30	-7.00	TRUE
5	Cheetahs vs. Brumbies	Mar 26	18 – 25	-11.30	TRUE
6	Sharks vs. Crusaders	Mar 26	14 – 19	-4.80	TRUE
7	Jaguares vs. Stormers	Mar 26	8 – 13	-6.30	TRUE
8	Reds vs. Waratahs	Mar 27	13 – 15	-10.90	TRUE

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Highlanders vs. Force	Apr 01	Highlanders	22.50
2	Lions vs. Crusaders	Apr 01	Crusaders	-5.70
3	Blues vs. Jaguares	Apr 02	Blues	8.00
4	Brumbies vs. Chiefs	Apr 02	Brumbies	3.90
5	Kings vs. Sunwolves	Apr 02	Sunwolves	-0.30
6	Bulls vs. Cheetahs	Apr 02	Bulls	11.50
7	Waratahs vs. Rebels	Apr 03	Waratahs	13.20

NRL Predictions for Round 5

By David Scott

Team Ratings for Round 5

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Cowboys	11.00	10.29	0.70
Broncos	8.74	9.81	-1.10
Roosters	4.37	11.20	-6.80
Bulldogs	3.76	1.50	2.30
Storm	3.63	4.41	-0.80
Rabbitohs	1.27	-1.20	2.50
Sharks	1.17	-1.06	2.20
Sea Eagles	-0.65	0.36	-1.00
Dragons	-1.25	-0.10	-1.20
Raiders	-1.28	-0.55	-0.70
Eels	-1.40	-4.62	3.20
Panthers	-2.21	-3.06	0.80
Wests Tigers	-4.64	-4.06	-0.60
Titans	-5.59	-8.39	2.80
Warriors	-5.85	-7.47	1.60
Knights	-9.41	-5.41	-4.00

Performance So Far

So far there have been 32 matches played, 17 of which were correctly predicted, a success rate of 53.1%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Rabbitohs vs. Bulldogs	Mar 25	12 – 42	5.20	FALSE
2	Broncos vs. Cowboys	Mar 25	21 – 20	0.70	TRUE
3	Raiders vs. Titans	Mar 26	20 – 24	9.20	FALSE
4	Roosters vs. Sea Eagles	Mar 26	20 – 22	9.70	FALSE
5	Dragons vs. Panthers	Mar 27	14 – 12	4.30	TRUE
6	Warriors vs. Knights	Mar 28	40 – 18	5.20	TRUE
7	Wests Tigers vs. Eels	Mar 28	0 – 8	1.10	FALSE
8	Sharks vs. Storm	Mar 28	14 – 6	-0.70	FALSE

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Sea Eagles vs. Rabbitohs	Mar 31	Sea Eagles	1.10
2	Titans vs. Broncos	Apr 01	Broncos	-11.30
3	Storm vs. Knights	Apr 02	Storm	16.00
4	Wests Tigers vs. Sharks	Apr 02	Sharks	-2.80
5	Cowboys vs. Dragons	Apr 02	Cowboys	15.20
6	Roosters vs. Warriors	Apr 03	Roosters	14.20
7	Eels vs. Panthers	Apr 03	Eels	3.80
8	Bulldogs vs. Raiders	Apr 04	Bulldogs	8.00

March 29, 2016

Chocolate probabilities

By Thomas Lumley

For those of you from other parts of the world, there has been a small sensation over the weekend here about Cadburys chocolate randomisation. One of their products was a large chocolate egg accompanied by eight miniature chocolate bars, chosen randomly from five varieties. Public opinion on the desirability of some of these varieties is more polarised that for others.

Stuff reports:

But one family found seven Cherry Ripes out of eight bars and most of those complaining to Cadbury say they found at least six Cherry Ripes out of eight.

Cadbury claimed that it was just bad luck saying the chocolates are processed randomly and the Cherry Ripe overdose was not intentional.

Both Stuff and The Guardian got advice on the probabilities. They get different answers: Martin Hazelton says seven out of eight being the same (of any variety) is about 1 in 10,000 and the Guardian’s two advisers say there’s nearly a 1 in 100 chance of getting seven Cherry Ripes out of eight (which is obviously less likely than getting seven of eight the same).

With a hundred-fold difference in the estimates, I think a tie-breaker is in order. Also, I’m going to do this the modern way: by simulation rather than by being clever. It’s much more reliable.

I’m going to trust the Guardian on what the five flavours were (since it doesn’t actually matter, I think this is safe). I’ve put the code and results for 100,000 simulated packages up here. The number of packs with seven or more bars the same was 44 out of 100,000. There’s obviously some random uncertainty here, but a 95% confidence interval for the proportion goes from 3 in 10,000 to 6 in 10,000, and so excludes both of the published estimates . Since computing time is nearly free, and the previous run took only 13 seconds, I tried it on a million simulated packs just to be sure, and also separated out ‘seven or more of anything’ from ‘seven or more Cherry Ripes’.

Out of a million simulated packs, 442 had seven or more of some type of bar, and 83 had seven or more Cherry Ripes. The probability of seven or more of something is between 4 and 5 out of 10,000 and the probability of seven or more Cherry Ripes is between 0.6 and 1 out of 10,000. It looks as though Professor Hazelton’s estimate of ‘a little less than one in 10,000‘ is correct for Cherry Ripes specifically. The Guardian figures seem clearly wrong. The Guardian is also wrong about the probability of getting at least one of each type, which this code shows to be about 30%, not the 7% they give.

I said I wasn’t going to do this by maths, but now I know the answer I’m going to go out on a limb here and guess that Martin Hazelton’s probability was, in maths terms, P(Binom(8, o.2)≥7), which is the answer I would have given for Cherry Ripes specifically. With Jack and Andrew in the Guardian I think the issue is that they have counted all 495 possible aggregate outcomes as being equally likely, when it’s actually the ~~32768~~ 390625 underlying ordered outcomes that are equally likely.

The other aspect of this computation is the alternative hypothesis. It makes no sense that Cadbury would just load up the bags with Cherry Ripes and pretend they hadn’t — especially as the Guardian reports other sorts of complaints as well. We need to ask not just whether the reports would be surprising if the bags were randomised, but whether there’s another explanation that fits the data better.

The Guardian story hints at a possibility: clumping together of similar chocolates. It also would be conceivable that the randomisation wasn’t quite even — that, say, Cherry Ripes were 25% instead of the intended 20%. It’s easy to modify the code for unequal probabilities. Having one chocolate type at 25% doubles the number of seven-or-more coincidences, and more than half of them are now with Cherry Ripes. But that’s quite a big imbalance to go unnoticed at Cadburys, and it doesn’t push the probability a lot.

So, I’d say bad luck is a feasible explanation, but it could easily have been aggravated by imperfect randomisation at Cadburys.

Many lessons could be drawn from this story: that simulation is a good way to do slightly complicated probability questions; that people see departures from randomness far too easily; that Cadburys should have done systematic sampling rather than random sampling; maybe even that innovative maths teachers may have gone too far in rejecting contrived ball-out-of-urn problems as having no Real World use.

View comments (11)

Stats Chat

Compared to what?

Briefly

Super 18 Predictions for Round 7

Team Ratings for Round 7

Performance So Far

Predictions for Round 7

NRL Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Stat of the Week Competition: April 2 – 8 2016

One weird trick increases donating tenfold?

Hold the lettuce

Super 18 Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

NRL Predictions for Round 5

Team Ratings for Round 5

Performance So Far

Predictions for Round 5

Chocolate probabilities

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 7

Performance So Far

Predictions for Round 7

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Team Ratings for Round 5

Performance So Far

Predictions for Round 5

Recent comments

Popular posts

Latest posts

All topics

Recommended sites