April 9, 2016

Compared to what?

Two maps via Twitter:

From the Sydney Morning Herald, via @mlle_elle and @rpy

creativemap

The differences in population density swamp anything else. For the map to be useful we’d need a comparison between ‘creative professionals’ and ‘non-creative unprofessionals’.  There’s an XKCD about this.

Peter Ellis has another visualisation of the last election that emphasises comparisons. Here’s a comparison of Green and Labour votes (by polling place) across Auckland.

votemap

There’s a clear division between the areas where Labour and Green polled about the same, and those where Labour did much better

 

April 8, 2016

Briefly

  • A lottery in the US rigged by subverting the random number generator.  That’s harder to do with the complicated balls-from-a-machine we use — and it’s also more obvious when drawing balls from a machine that betting systems based on sophisticated numerical sequences won’t work.
  • The (US) Transport Security Administration has a ‘fast lane’ for more-trusted travellers, who get chosen for screening randomly. They use a randomizer app to make sure it really is random, which is a good idea — people are very bad at random choices. But perhaps it shouldn’t have cost $50k.
  • The Panama Papers are an example of the importance of data skills to journalists.
  • University of Otago research on microRNA may help with Alzheimer’s Disease diagnosis, which is interesting and potentially very useful, but there have been a lot of ‘potential tests’ recently. Also the research is unpublished and they aren’t disclosing yet which microRNAs are involved, so perhaps the publicity could have waited.
April 6, 2016

Super 18 Predictions for Round 7

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.97 9.84 -0.90
Highlanders 7.10 6.80 0.30
Chiefs 7.07 2.68 4.40
Hurricanes 5.64 7.26 -1.60
Brumbies 3.47 3.15 0.30
Waratahs 2.30 4.88 -2.60
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.74 -1.80 1.10
Bulls -2.00 -0.74 -1.30
Blues -4.72 -5.51 0.80
Rebels -5.31 -6.33 1.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.21 -9.27 0.10
Reds -10.01 -9.81 -0.20
Force -10.16 -8.43 -1.70
Sunwolves -12.43 -10.00 -2.40
Kings -16.08 -13.66 -2.40

 

Performance So Far

So far there have been 48 matches played, 31 of which were correctly predicted, a success rate of 64.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Force Apr 01 32 – 20 22.50 TRUE
2 Lions vs. Crusaders Apr 01 37 – 43 -5.70 TRUE
3 Blues vs. Jaguares Apr 02 24 – 16 8.00 TRUE
4 Brumbies vs. Chiefs Apr 02 23 – 48 3.90 FALSE
5 Kings vs. Sunwolves Apr 02 33 – 28 -0.30 FALSE
6 Bulls vs. Cheetahs Apr 02 23 – 18 11.50 TRUE
7 Waratahs vs. Rebels Apr 03 17 – 21 13.20 FALSE

 

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Chiefs vs. Blues Apr 08 Chiefs 15.30
2 Force vs. Crusaders Apr 08 Crusaders -15.10
3 Stormers vs. Sunwolves Apr 08 Stormers 17.90
4 Hurricanes vs. Jaguares Apr 09 Hurricanes 18.30
5 Reds vs. Highlanders Apr 09 Highlanders -13.10
6 Sharks vs. Lions Apr 09 Sharks 4.30
7 Kings vs. Bulls Apr 09 Bulls -10.60

 

NRL Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 12.42 10.29 2.10
Broncos 8.47 9.81 -1.30
Roosters 3.11 11.20 -8.10
Storm 2.78 4.41 -1.60
Bulldogs 2.25 1.50 0.80
Rabbitohs 1.67 -1.20 2.90
Sharks 1.59 -1.06 2.60
Raiders 0.23 -0.55 0.80
Sea Eagles -1.06 0.36 -1.40
Panthers -1.74 -3.06 1.30
Eels -1.87 -4.62 2.80
Dragons -2.67 -0.10 -2.60
Warriors -4.59 -7.47 2.90
Wests Tigers -5.05 -4.06 -1.00
Titans -5.32 -8.39 3.10
Knights -8.56 -5.41 -3.10

 

Performance So Far

So far there have been 40 matches played, 21 of which were correctly predicted, a success rate of 52.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sea Eagles vs. Rabbitohs Mar 31 12 – 16 1.10 FALSE
2 Titans vs. Broncos Apr 01 16 – 24 -11.30 TRUE
3 Storm vs. Knights Apr 02 18 – 14 16.00 TRUE
4 Wests Tigers vs. Sharks Apr 02 26 – 34 -2.80 TRUE
5 Cowboys vs. Dragons Apr 02 36 – 0 15.20 TRUE
6 Roosters vs. Warriors Apr 03 28 – 32 14.20 FALSE
7 Eels vs. Panthers Apr 03 18 – 20 3.80 FALSE
8 Bulldogs vs. Raiders Apr 04 8 – 22 8.00 FALSE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Dragons Apr 07 Broncos 14.10
2 Rabbitohs vs. Roosters Apr 08 Rabbitohs 1.60
3 Eels vs. Raiders Apr 09 Eels 0.90
4 Warriors vs. Sea Eagles Apr 09 Warriors 0.50
5 Panthers vs. Cowboys Apr 09 Cowboys -11.20
6 Sharks vs. Titans Apr 10 Sharks 9.90
7 Knights vs. Wests Tigers Apr 10 Wests Tigers -0.50
8 Storm vs. Bulldogs Apr 11 Storm 3.50

 

April 4, 2016

Stat of the Week Competition: April 2 – 8 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 8 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 2 – 8 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

April 2, 2016

One weird trick increases donating tenfold?

From the Herald:

US researchers have confirmed a strange link between touching rough surfaces and feeling for others, which could help charities raise more money.

Based on my usual complaints about this sort of claim, you might expect that the research didn’t look at donating money  or that it saw only a tiny difference. No.

There were five experiments, but only one that involved actual money. People were approached on the street and given a description of  a health-related charity, and asked to donate. One charity was real, working in a familiar disease; the other was fake, working in a real but obscure disease (all the money actually ended up with the real charity).  Half the participants were given the information and donation envelope on a clipboard with rough sandpaper on the back; the other half weren’t.

1-s2.0-S1057740815001035-gr4

When asked to donate to the National Breast Cancer Foundation there was no difference between the rough and smooth clipboards (as you’d expect). When asked to donate to the National Sjögren’s Foundation, 10/34 with sandpaper-backed clipboards said yes compared to only 1/32 with smooth clipboards.

I’m going to go very slightly out on a limb here to say there is no way this ten-fold increase is a real and generalisable phenomenon.  So, what went wrong?  Part of the problem is what Andrew Gelman calls the ‘garden of forking paths’, after the Jorge Luis Borges story — there are many, many possible analyses and they don’t all show this dramatic difference.

For example, there wasn’t a difference in donation probability with the familiar charity. This was consistent with the researchers’ theory, but I’m pretty sure if there had been a difference the researchers wouldn’t have considered it as evidence refuting the theory. Also, the researchers note that they didn’t see a difference in donation amount with the sandpaper, just in donation probability.

Also, if you assume the ten-fold increase was overestimated even a bit, you then get into the problem of sample size. Suppose that the effect was only a two-fold increase rather than ten-fold. That still seems implausibly large to me, but the comparison would then be something like 2/34 vs 1/32 and would be completely unimpressive.  You’d need a sample size something like ten times larger.  And that’s if a bit of sandpaper on the back of a clipboard doubled the number of people who donated.

Still, these findings could have “significant implications for less well-known charities”, as the researchers suggest. If I got approached by a charity using sandpaper on the back of their clipboards, I would tend to think they were (a) poor at evaluating evidence, and (b) not all that honest. I could see that having an impact.

March 30, 2016

Hold the lettuce

Q: Did you see vegetarian diets cause cancer now?

A: No.

Q: The Herald site front page: headline Vegetarianism can lead to cancer?

vege

A: No

Q: The teaser: “Scientists have found there can be long-term health risks associated with a vegetarian diet, that could outweigh the benefits.”?

A: Well, it depends on what you mean by ‘long-term’, for a start.

Q: How long-term?

A: Centuries, perhaps thousands of years.

Q: How did they find people who were thousands of years old? And why isn’t that the headline?

A: Not people.

Q: I refuse to believe in century-old lab mice.

A: Human populations.

Q: Ok, so if we click through to the story (from the Telegraph) it seems they’re saying your great-grandparents eating lettuce gives you harmful mutations?

A: That’s what the story says, but it’s not what the research says. The research suggests that a mutation that with a modern diet might increase cancer risk arose randomly a long time in the past and became common in a South Asian population where vegetarian diets have been common.

Q: How did the mutation become common?

A: Because it wasn’t true that the long-term health risks outweighed the benefits — there’s genetic evidence of  ‘selection’ in the evolutionary sense, meaning that people with the mutation had more descendants on average.

Q: How much health risk did they find?

A: They weren’t looking at health risks

Q: But “long-term health risks” and “can lead to cancer”?

A: Sadly, yes.

Q: Ok, what were they looking at?

A: They were looking at enzymes that turns one type of fatty acid into another. The mutation makes it easier for the body to synthesis long polyunsaturated acids

Q: Aren’t they good?

A: Some of them, like the DHA and EPA also found in fish, are thought to reduce inflammation and heart disease. But arachidonic acid is thought to increase inflammation, though the American Heart Association isn’t convinced

Q: That’s heart disease. What about cancer?

A: The only links to cancer are pretty speculative — that the mutation could reinforce effects of modern diet in increasing cancer risk.  The contribution of arachidonic acid to that is controversial. But it could be real.

Q: Is there actually a higher cancer rate where they got their vegetarian population from, compared to the control population?

A: No.

Q: That ‘arachidonic acid’ thing. Why does that make me think of spiders?

A: Yes, me too. It’s a false cognate: Latin ‘arachis‘, ‘peanut’, not the mythic Greek technologist Aράχνη that arachnids were named for.

Super 18 Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.95 9.84 -0.90
Highlanders 7.73 6.80 0.90
Hurricanes 5.64 7.26 -1.60
Chiefs 5.34 2.68 2.70
Brumbies 5.21 3.15 2.10
Waratahs 3.34 4.88 -1.50
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.72 -1.80 1.10
Bulls -1.61 -0.74 -0.90
Blues -4.72 -5.51 0.80
Rebels -6.35 -6.33 -0.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.60 -9.27 -0.30
Reds -10.01 -9.81 -0.20
Force -10.79 -8.43 -2.40
Sunwolves -12.12 -10.00 -2.10
Kings -16.40 -13.66 -2.70

 

Performance So Far

So far there have been 41 matches played, 27 of which were correctly predicted, a success rate of 65.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Hurricanes vs. Kings Mar 25 42 – 20 26.60 TRUE
2 Chiefs vs. Force Mar 26 53 – 10 17.00 TRUE
3 Rebels vs. Highlanders Mar 26 3 – 27 -8.20 TRUE
4 Sunwolves vs. Bulls Mar 26 27 – 30 -7.00 TRUE
5 Cheetahs vs. Brumbies Mar 26 18 – 25 -11.30 TRUE
6 Sharks vs. Crusaders Mar 26 14 – 19 -4.80 TRUE
7 Jaguares vs. Stormers Mar 26 8 – 13 -6.30 TRUE
8 Reds vs. Waratahs Mar 27 13 – 15 -10.90 TRUE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Force Apr 01 Highlanders 22.50
2 Lions vs. Crusaders Apr 01 Crusaders -5.70
3 Blues vs. Jaguares Apr 02 Blues 8.00
4 Brumbies vs. Chiefs Apr 02 Brumbies 3.90
5 Kings vs. Sunwolves Apr 02 Sunwolves -0.30
6 Bulls vs. Cheetahs Apr 02 Bulls 11.50
7 Waratahs vs. Rebels Apr 03 Waratahs 13.20

 

NRL Predictions for Round 5

Team Ratings for Round 5

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 11.00 10.29 0.70
Broncos 8.74 9.81 -1.10
Roosters 4.37 11.20 -6.80
Bulldogs 3.76 1.50 2.30
Storm 3.63 4.41 -0.80
Rabbitohs 1.27 -1.20 2.50
Sharks 1.17 -1.06 2.20
Sea Eagles -0.65 0.36 -1.00
Dragons -1.25 -0.10 -1.20
Raiders -1.28 -0.55 -0.70
Eels -1.40 -4.62 3.20
Panthers -2.21 -3.06 0.80
Wests Tigers -4.64 -4.06 -0.60
Titans -5.59 -8.39 2.80
Warriors -5.85 -7.47 1.60
Knights -9.41 -5.41 -4.00

 

Performance So Far

So far there have been 32 matches played, 17 of which were correctly predicted, a success rate of 53.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Rabbitohs vs. Bulldogs Mar 25 12 – 42 5.20 FALSE
2 Broncos vs. Cowboys Mar 25 21 – 20 0.70 TRUE
3 Raiders vs. Titans Mar 26 20 – 24 9.20 FALSE
4 Roosters vs. Sea Eagles Mar 26 20 – 22 9.70 FALSE
5 Dragons vs. Panthers Mar 27 14 – 12 4.30 TRUE
6 Warriors vs. Knights Mar 28 40 – 18 5.20 TRUE
7 Wests Tigers vs. Eels Mar 28 0 – 8 1.10 FALSE
8 Sharks vs. Storm Mar 28 14 – 6 -0.70 FALSE

 

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Rabbitohs Mar 31 Sea Eagles 1.10
2 Titans vs. Broncos Apr 01 Broncos -11.30
3 Storm vs. Knights Apr 02 Storm 16.00
4 Wests Tigers vs. Sharks Apr 02 Sharks -2.80
5 Cowboys vs. Dragons Apr 02 Cowboys 15.20
6 Roosters vs. Warriors Apr 03 Roosters 14.20
7 Eels vs. Panthers Apr 03 Eels 3.80
8 Bulldogs vs. Raiders Apr 04 Bulldogs 8.00

 

March 29, 2016

Chocolate probabilities

For those of you from other parts of the world, there has been a small sensation over the weekend here about Cadburys chocolate randomisation. One of their products was a large chocolate egg accompanied by eight miniature chocolate bars, chosen randomly from five varieties.  Public opinion on the desirability of some of these varieties is more polarised that for others.

Stuff reports:

But one family found seven Cherry Ripes out of eight bars and most of those complaining to Cadbury say they found at least six Cherry Ripes out of eight. 

Cadbury claimed that it was just bad luck saying the chocolates are processed randomly and the Cherry Ripe overdose was not intentional. 

Both Stuff and The Guardian got advice on the probabilities. They get different answers: Martin Hazelton says seven out of eight being the same (of any variety) is about 1 in 10,000 and the Guardian’s two advisers say there’s nearly a 1 in 100 chance of getting seven Cherry Ripes out of eight (which is obviously less likely than getting seven of eight the same).

With a hundred-fold difference in the estimates, I think a tie-breaker is in order. Also, I’m going to do this the modern way: by simulation rather than by being clever. It’s much more reliable.

I’m going to trust the Guardian on what the five flavours were (since it doesn’t actually matter, I think this is safe).  I’ve put the code and results for 100,000 simulated packages up here.  The number of packs with seven or more bars the same was 44 out of 100,000. There’s obviously some random uncertainty here, but a 95% confidence interval for the proportion goes from 3 in 10,000 to 6 in 10,000, and so excludes both of the published estimates .  Since computing time is nearly free, and the previous run took only 13 seconds, I tried it on a million simulated packs just to be sure, and also separated out ‘seven or more of anything’ from ‘seven or more Cherry Ripes’.

Out of a million simulated packs, 442 had seven or more of some type of bar, and 83 had seven or more Cherry Ripes.  The probability of seven or more of something is between 4 and 5 out of 10,000 and the probability of seven or more Cherry Ripes is between 0.6 and 1 out of 10,000. It looks as though Professor Hazelton’s estimate of ‘a little less than one in 10,000‘ is correct for Cherry Ripes specifically.  The Guardian figures seem clearly wrong. The Guardian is also wrong about the probability of getting at least one of each type, which this code shows to be about 30%, not the 7% they give.

I said I wasn’t going to do this by maths, but now I know the answer I’m going to go out on a limb here and guess that Martin Hazelton’s probability was, in maths terms, P(Binom(8, o.2)≥7), which is the answer I would have given for Cherry Ripes specifically. With Jack and Andrew in the Guardian I think the issue is that they have counted all 495 possible aggregate outcomes as being equally likely, when it’s actually the 32768 390625 underlying ordered outcomes that are equally likely.

The other aspect of this computation is the alternative hypothesis. It makes no sense that Cadbury would just load up the bags with Cherry Ripes and pretend they hadn’t — especially as the Guardian reports other sorts of complaints as well. We need to ask not just whether the reports would be surprising if the bags were randomised, but whether there’s another explanation that fits the data better.

The Guardian story hints at a possibility: clumping together of similar chocolates. It also would be conceivable that the randomisation wasn’t quite even — that, say,  Cherry Ripes were 25% instead of the intended 20%. It’s easy to modify the code for unequal probabilities. Having one chocolate type at 25% doubles the number of seven-or-more coincidences, and more than half of them are now with Cherry Ripes. But that’s quite a big imbalance to go unnoticed at Cadburys, and it doesn’t push the probability a lot.

So, I’d say bad luck is a feasible explanation, but it could easily have been aggravated by imperfect randomisation at Cadburys.

Many lessons could be drawn from this story: that simulation is a good way to do slightly complicated probability questions; that people see departures from randomness far too easily; that Cadburys should have done systematic sampling rather than random sampling; maybe even that innovative maths teachers may have gone too far in rejecting contrived ball-out-of-urn problems as having no Real World use.