Posts from March 2016 (44)

March 30, 2016

Hold the lettuce

Q: Did you see vegetarian diets cause cancer now?

A: No.

Q: The Herald site front page: headline Vegetarianism can lead to cancer?

vege

A: No

Q: The teaser: “Scientists have found there can be long-term health risks associated with a vegetarian diet, that could outweigh the benefits.”?

A: Well, it depends on what you mean by ‘long-term’, for a start.

Q: How long-term?

A: Centuries, perhaps thousands of years.

Q: How did they find people who were thousands of years old? And why isn’t that the headline?

A: Not people.

Q: I refuse to believe in century-old lab mice.

A: Human populations.

Q: Ok, so if we click through to the story (from the Telegraph) it seems they’re saying your great-grandparents eating lettuce gives you harmful mutations?

A: That’s what the story says, but it’s not what the research says. The research suggests that a mutation that with a modern diet might increase cancer risk arose randomly a long time in the past and became common in a South Asian population where vegetarian diets have been common.

Q: How did the mutation become common?

A: Because it wasn’t true that the long-term health risks outweighed the benefits — there’s genetic evidence of  ‘selection’ in the evolutionary sense, meaning that people with the mutation had more descendants on average.

Q: How much health risk did they find?

A: They weren’t looking at health risks

Q: But “long-term health risks” and “can lead to cancer”?

A: Sadly, yes.

Q: Ok, what were they looking at?

A: They were looking at enzymes that turns one type of fatty acid into another. The mutation makes it easier for the body to synthesis long polyunsaturated acids

Q: Aren’t they good?

A: Some of them, like the DHA and EPA also found in fish, are thought to reduce inflammation and heart disease. But arachidonic acid is thought to increase inflammation, though the American Heart Association isn’t convinced

Q: That’s heart disease. What about cancer?

A: The only links to cancer are pretty speculative — that the mutation could reinforce effects of modern diet in increasing cancer risk.  The contribution of arachidonic acid to that is controversial. But it could be real.

Q: Is there actually a higher cancer rate where they got their vegetarian population from, compared to the control population?

A: No.

Q: That ‘arachidonic acid’ thing. Why does that make me think of spiders?

A: Yes, me too. It’s a false cognate: Latin ‘arachis‘, ‘peanut’, not the mythic Greek technologist Aράχνη that arachnids were named for.

Super 18 Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.95 9.84 -0.90
Highlanders 7.73 6.80 0.90
Hurricanes 5.64 7.26 -1.60
Chiefs 5.34 2.68 2.70
Brumbies 5.21 3.15 2.10
Waratahs 3.34 4.88 -1.50
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.72 -1.80 1.10
Bulls -1.61 -0.74 -0.90
Blues -4.72 -5.51 0.80
Rebels -6.35 -6.33 -0.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.60 -9.27 -0.30
Reds -10.01 -9.81 -0.20
Force -10.79 -8.43 -2.40
Sunwolves -12.12 -10.00 -2.10
Kings -16.40 -13.66 -2.70

 

Performance So Far

So far there have been 41 matches played, 27 of which were correctly predicted, a success rate of 65.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Hurricanes vs. Kings Mar 25 42 – 20 26.60 TRUE
2 Chiefs vs. Force Mar 26 53 – 10 17.00 TRUE
3 Rebels vs. Highlanders Mar 26 3 – 27 -8.20 TRUE
4 Sunwolves vs. Bulls Mar 26 27 – 30 -7.00 TRUE
5 Cheetahs vs. Brumbies Mar 26 18 – 25 -11.30 TRUE
6 Sharks vs. Crusaders Mar 26 14 – 19 -4.80 TRUE
7 Jaguares vs. Stormers Mar 26 8 – 13 -6.30 TRUE
8 Reds vs. Waratahs Mar 27 13 – 15 -10.90 TRUE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Force Apr 01 Highlanders 22.50
2 Lions vs. Crusaders Apr 01 Crusaders -5.70
3 Blues vs. Jaguares Apr 02 Blues 8.00
4 Brumbies vs. Chiefs Apr 02 Brumbies 3.90
5 Kings vs. Sunwolves Apr 02 Sunwolves -0.30
6 Bulls vs. Cheetahs Apr 02 Bulls 11.50
7 Waratahs vs. Rebels Apr 03 Waratahs 13.20

 

NRL Predictions for Round 5

Team Ratings for Round 5

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 11.00 10.29 0.70
Broncos 8.74 9.81 -1.10
Roosters 4.37 11.20 -6.80
Bulldogs 3.76 1.50 2.30
Storm 3.63 4.41 -0.80
Rabbitohs 1.27 -1.20 2.50
Sharks 1.17 -1.06 2.20
Sea Eagles -0.65 0.36 -1.00
Dragons -1.25 -0.10 -1.20
Raiders -1.28 -0.55 -0.70
Eels -1.40 -4.62 3.20
Panthers -2.21 -3.06 0.80
Wests Tigers -4.64 -4.06 -0.60
Titans -5.59 -8.39 2.80
Warriors -5.85 -7.47 1.60
Knights -9.41 -5.41 -4.00

 

Performance So Far

So far there have been 32 matches played, 17 of which were correctly predicted, a success rate of 53.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Rabbitohs vs. Bulldogs Mar 25 12 – 42 5.20 FALSE
2 Broncos vs. Cowboys Mar 25 21 – 20 0.70 TRUE
3 Raiders vs. Titans Mar 26 20 – 24 9.20 FALSE
4 Roosters vs. Sea Eagles Mar 26 20 – 22 9.70 FALSE
5 Dragons vs. Panthers Mar 27 14 – 12 4.30 TRUE
6 Warriors vs. Knights Mar 28 40 – 18 5.20 TRUE
7 Wests Tigers vs. Eels Mar 28 0 – 8 1.10 FALSE
8 Sharks vs. Storm Mar 28 14 – 6 -0.70 FALSE

 

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Rabbitohs Mar 31 Sea Eagles 1.10
2 Titans vs. Broncos Apr 01 Broncos -11.30
3 Storm vs. Knights Apr 02 Storm 16.00
4 Wests Tigers vs. Sharks Apr 02 Sharks -2.80
5 Cowboys vs. Dragons Apr 02 Cowboys 15.20
6 Roosters vs. Warriors Apr 03 Roosters 14.20
7 Eels vs. Panthers Apr 03 Eels 3.80
8 Bulldogs vs. Raiders Apr 04 Bulldogs 8.00

 

March 29, 2016

Chocolate probabilities

For those of you from other parts of the world, there has been a small sensation over the weekend here about Cadburys chocolate randomisation. One of their products was a large chocolate egg accompanied by eight miniature chocolate bars, chosen randomly from five varieties.  Public opinion on the desirability of some of these varieties is more polarised that for others.

Stuff reports:

But one family found seven Cherry Ripes out of eight bars and most of those complaining to Cadbury say they found at least six Cherry Ripes out of eight. 

Cadbury claimed that it was just bad luck saying the chocolates are processed randomly and the Cherry Ripe overdose was not intentional. 

Both Stuff and The Guardian got advice on the probabilities. They get different answers: Martin Hazelton says seven out of eight being the same (of any variety) is about 1 in 10,000 and the Guardian’s two advisers say there’s nearly a 1 in 100 chance of getting seven Cherry Ripes out of eight (which is obviously less likely than getting seven of eight the same).

With a hundred-fold difference in the estimates, I think a tie-breaker is in order. Also, I’m going to do this the modern way: by simulation rather than by being clever. It’s much more reliable.

I’m going to trust the Guardian on what the five flavours were (since it doesn’t actually matter, I think this is safe).  I’ve put the code and results for 100,000 simulated packages up here.  The number of packs with seven or more bars the same was 44 out of 100,000. There’s obviously some random uncertainty here, but a 95% confidence interval for the proportion goes from 3 in 10,000 to 6 in 10,000, and so excludes both of the published estimates .  Since computing time is nearly free, and the previous run took only 13 seconds, I tried it on a million simulated packs just to be sure, and also separated out ‘seven or more of anything’ from ‘seven or more Cherry Ripes’.

Out of a million simulated packs, 442 had seven or more of some type of bar, and 83 had seven or more Cherry Ripes.  The probability of seven or more of something is between 4 and 5 out of 10,000 and the probability of seven or more Cherry Ripes is between 0.6 and 1 out of 10,000. It looks as though Professor Hazelton’s estimate of ‘a little less than one in 10,000‘ is correct for Cherry Ripes specifically.  The Guardian figures seem clearly wrong. The Guardian is also wrong about the probability of getting at least one of each type, which this code shows to be about 30%, not the 7% they give.

I said I wasn’t going to do this by maths, but now I know the answer I’m going to go out on a limb here and guess that Martin Hazelton’s probability was, in maths terms, P(Binom(8, o.2)≥7), which is the answer I would have given for Cherry Ripes specifically. With Jack and Andrew in the Guardian I think the issue is that they have counted all 495 possible aggregate outcomes as being equally likely, when it’s actually the 32768 390625 underlying ordered outcomes that are equally likely.

The other aspect of this computation is the alternative hypothesis. It makes no sense that Cadbury would just load up the bags with Cherry Ripes and pretend they hadn’t — especially as the Guardian reports other sorts of complaints as well. We need to ask not just whether the reports would be surprising if the bags were randomised, but whether there’s another explanation that fits the data better.

The Guardian story hints at a possibility: clumping together of similar chocolates. It also would be conceivable that the randomisation wasn’t quite even — that, say,  Cherry Ripes were 25% instead of the intended 20%. It’s easy to modify the code for unequal probabilities. Having one chocolate type at 25% doubles the number of seven-or-more coincidences, and more than half of them are now with Cherry Ripes. But that’s quite a big imbalance to go unnoticed at Cadburys, and it doesn’t push the probability a lot.

So, I’d say bad luck is a feasible explanation, but it could easily have been aggravated by imperfect randomisation at Cadburys.

Many lessons could be drawn from this story: that simulation is a good way to do slightly complicated probability questions; that people see departures from randomness far too easily; that Cadburys should have done systematic sampling rather than random sampling; maybe even that innovative maths teachers may have gone too far in rejecting contrived ball-out-of-urn problems as having no Real World use.

March 28, 2016

Briefly

  • “Unlike other projects that map cities by sound, Chatty Maps isn’t measuring volume or ranking neighborhoods as noisy or quiet. Instead, it shows the city across a spectrum of different sounds—as well as the emotions we associate them with.” I’m not convinced, but it’s interesting to look at. (via @teh_aimee)
  • “World Cup fans not responsible for the Zika outbreak”. (Scientific American blog, open-access research paper)   I think ‘responsible’ is the wrong word, but in any case, looking at the genomes of Zika virus specimens suggests that the current virus has been circulating in the Americas since 2013 at least. Also, the three samples from microcephaly cases don’t share any relevant mutation, so the more-severe disease in the current outbreak probably isn’t due to a change in the virus.  You can do a lot with genetics.
  • “Can an algorithm be wrong?” from limn.it
  • “Exposing algorithms” from the Tow Center for Digital Journalism: a summary from a session at the National Institute for Computer-Assisted Reporting conference

Stat of the Week Competition: March 26 – April 1 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 1 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of March 26 – April 1 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: March 26 – April 1 2016

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

March 24, 2016

The fleg

Two StatsChat relevant points to be made.

First, the opinion polls underestimated the ‘change’ vote — not disastrously, but enough that they likely won’t be putting this referendum at the top of their portfolios.  In the four polls for the second phase of the referendum after the first phase was over, the lowest support for the current flag (out of those expressing an opinion) was 62%. The result was 56.6%.  The data are consistent with support for the fern increasing over time, but I wouldn’t call the evidence compelling.

Second, the relationship with party vote. The Herald, as is their wont, have a nice interactive thingy up on the Insights blog giving results by electorate, but they don’t do party vote (yet — it’s only been an hour).  Here are scatterplots for the referendum vote and main(ish) party votes (the open circles are the Māori electorates, and I have ignored the Northland byelection). The data are from here and here.

fleg

The strongest relationship is with National vote, whether because John Key’s endorsement swayed National voters or whether it did whatever the opposite of swayed is for anti-National voters.

Interestingly, given Winston Peters’s expressed views, electorates with higher NZ First vote and the same National vote were more likely to go for the fern.  This graph shows the fern vote vs NZ First vote for electorates divided into six groups based on their National vote. Those with low National vote are on the left; those with high National vote are on the right. (click to embiggen).
winston

There’s an increasing trend across panels because electorates with higher National vote were more fern-friendly. There’s also an increasing trend within each panel, because electorates with similar National vote but higher NZ First vote were more fern-friendly.  For people who care, yes, this is backed up by the regression models.

 

Two cheers for evidence-based policy

Daniel Davies has a post at the Long and Short and a follow-up post at Crooked Timber about the implications for evidence-based policy of non-replicability in science.

Two quotes:

 So the real ‘reproducibility crisis’ for evidence-based policy making would be: if you’re serious about basing policy on evidence, how much are you prepared to spend on research, and how long are you prepared to wait for the answers?

and

“We’ve got to do something“. Well, do we? And equally importantly, do we have to do something right now, rather than waiting quite a long time to get some reproducible evidence? I’ve written at length, several times, in the past, about the regrettable tendency of policymakers and their advisors to underestimate a number of costs; the physical deadweight cost of reorganisation, the stress placed on any organisation by radical change, and the option value of waiting. 

Graphics: what are they good for?

From Lucas Estevem, an interactive text-sentiment visualiser (click to embiggen, as usual)

sentiment

Andrew Gelman, whose class this was a project for, asks what the visualiser is useful for?

An interactive display is particularly valuable because we can try out different texts, or even alter the existing document word by word, in order to reverse-engineer the sentiment analyzer and see how it works. The sentiment analyzer is far from perfect, and being able to look inside in this way can give us insight into where it will be useful, where it might mislead, and how it might be improved.

Visualization. It’s not just about showing off. It’s a tool for discovering and learning about anomalies.