Posts from July 2017 (30)

July 31, 2017

Stat of the Week Competition: July 29 – August 4 2017

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday August 4 2017.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of July 29 – August 4 2017 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: July 29 – August 4 2017

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

July 30, 2017

Coffee news?

In 2015, the Herald said

Drinking the caffeine equivalent of more than four espressos a day is harmful to health, especially for minors and pregnant women, the European Union food safety agency has said.

“It is the first time that the risks from caffeine from all dietary sources have been assessed at EU level,” the EFSA said, recommending that an adult’s daily caffeine intake remain below 400mg a day.

(I quoted it at the time: the link seems to be dead now).

Now we have, under the headline Good news for coffee lovers: Caffeine is harmless, says research

A review of 44 trials dispelled the widespread myth that caffeine, found in tea, coffee and fizzy drinks, is bad for the body.

It found that sticking to the recommended daily amount of 400mg – the equivalent four cups of coffee or eight cups of tea – has no lasting damage on the body.

The recommendation that 400mg/day is generally safe was described as ‘caffeine is dangerous’ in 2015 and ‘caffeine is harmless’ now.

Other not-news about this is the not-new research. Obviously the Daily Mail (the only link) isn’t a research source. The research was published in Complete Nutrition, a professional magazine for UK dieticians. As their website says

Each issue of CN is packed with articles which are practical, educational and topical, and all are written by independent, well-respected authors from across the profession.

That’s a valuable mission for a journal, but it would be surprising if an expert opinion article in a journal like that contained new research worth international headlines.

What are election polls trying to estimate? And is Stuff different?

Stuff has a new election ‘poll of polls’.

The Stuff poll of polls is an average of the most recent of each of the public political polls in New Zealand. Currently, there are only three: Roy Morgan, Colmar Brunton and Reid Research. 

When these companies release a new poll it replaces their previous one in the average.

The Stuff poll of polls differs from others by giving weight to each poll based on how recent it is.

All polls less than 36 days old get equal weight. Any poll 36-70 days old carries a weight of 0.67, 70-105 days old a weight 0.33 and polls greater than 105 days old carry no weight in the average.

In thinking about whether this is a good idea, we’d need to first think about what the poll is trying to estimate and about the reasons it doesn’t get that target quantity exactly right.

Officially, polls are trying to estimate what would happen “if an election were held tomorrow”, and there’s no interest in prediction for dates further forward in time than that. If that were strictly true, no-one would care about polls, since the results would refer only to the past two weeks when the surveys were done.

A poll taken over a two-week period is potentially relevant because there’s an underlying truth that, most of the time, changes more slowly than this.  It will occasionally change faster — eg, Donald Trump’s support in the US polls seems to have increased after James Comey’s claims about Clinton’s emails in the US, and Labour’s support in the UK polls increased after the election was called — but it will mostly change slower. In my view, that’s the thing people are trying to estimate, and they’re trying to estimate it because it has some medium-term predictive value.

In addition to changes in the underlying truth, there is the idealised sampling variability that pollsters quote as the ‘margin of error’. There’s also larger sampling variability that comes because polling isn’t mathematically perfect. And there are ‘house effects’, where polls from different companies have consistent differences in the medium to long term, and none of them perfectly match voting intentions as expressed at actual elections.

Most of the time, in New Zealand — when we’re not about to have an election — the only recent poll is a Roy Morgan poll, because  Roy Morgan polls more much often than anyone else.  That means the Stuff poll of polls will be dominated by the most recent Roy Morgan poll.  This would be a good idea if you thought that changes in underlying voting intention were large compared to sampling variability and house effects. If you thought sampling variability was larger, you’d want multiple polls from a single company (perhaps downweighted by time).  If you thought house effects were non-negligible, you wouldn’t want to downweight other companies’ older polls as aggressively.

Near an election, there are lots more polls, so the most recent poll from each company is likely to be recent enough to get reasonably high weight. The Stuff poll is then distinctive in that it complete drops all but the most recent poll from each company.

Recency weighting, however, isn’t at all unique to the Stuff poll of polls. For example, the pundit.co.nz poll of polls downweights older polls, but doesn’t drop the weight to zero once another poll comes out. Peter Ellis’s two summaries both downweight older polls in a more complicated and less arbitrary way; the same was true of Peter Green’s poll aggregation when he was doing it.  Curia’s average downweights even more aggressively than Stuff’s, but does not otherwise discard older polls by the same company. RadioNZ averages the only the four most recent available results (regardless of company) — they don’t do any other weighting for recency, but that’s plenty.

However, another thing recent elections have shown us is that uncertainty estimates are important: that’s what Nate Silver and almost no-one else got right in the US. The big limitation of simple, transparent poll of poll aggregators is that they say nothing useful about uncertainty.

July 29, 2017

Anything goes

According to a story in the Herald, based on what looks like it might be a bogus poll (press release), you need $5.3 million in Australia now to be considered rich.  If we assumed the number did actually measure something, how surprising would it be?

Before “Who wants to be a millionaire?” was a quiz show franchise, it was a Cole Porter song, from the  1956 movie “High Society”, so that seems a reasonable comparison period. The Australian CPI has gone up by a factor of 15.6 since 1956 (and while Australia didn’t have dollars until 1966, US and Australian dollars were roughly comparable then).

On top of pure currency conversion, though, Australia is richer now than in 1956.  Australia’s GDP in current purchasing-power adjusted dollars is nearly 8 times what it was in 1956. The population has gone from 9.4 million to 24.1 million, so real GDP per capita is up by a factor of about 3.5.

So, a 1956 million would be 15.6 current millions just from inflation, and over $50 million as a share of Australia’s economy: a millionaire in those days was not just rich, but Big Rich — as the song says: “flashy flunkies everywhere… a gigantic yacht… liveried chauffeur.”

We’re not given any real reason to believe the $5.3 million figure — there’s no reason you should rely on it more than your own guess. And ‘millionaire’ isn’t a useful comparison without a lot of additional qualification.

July 27, 2017

Will we ever use this in real life?

From deep in the archives at Language Log

The Pirahã language and culture seem to lack not only the words but also the concepts for numbers, using instead less precise terms like “small size”, “large size” and “collection”. And the Pirahã people themselves seem to be suprisingly uninterested in learning about numbers, and even actively resistant to doing so, despite the fact that in their frequent dealings with traders they have a practical need to evaluate and compare numerical expressions. A similar situation seems to obtain among some other groups in Amazonia, and a lack of indigenous words for numbers has been reported elsewhere in the world.

Many people find this hard to believe. These are simple and natural concepts, of great practical importance: how could rational people resist learning to understand and use them? I don’t know the answer. But I do know that we can investigate a strictly comparable case, equally puzzling to me, right here in the U.S. of A.

From context, you can probably guess where he’s heading

July 25, 2017

Tell them to buy an ad

From the editing blog “Heads Up”

… you don’t need a course in statistics to ask what a writer means by “incident count,” “city” and “occurrence percentage,” not to mention why and how the means are weighted, or even why users of an insurance comparison website would be a good representation of a city where a huge proportion of drivers are uninsured. 

And

This isn’t “fake news” in the 2016 sense; it’s the old-school kind that has always gotten past enough gatekeepers to do its work. The traditional response is “tell them to buy an ad.”

Briefly

  • “Algorithms can dictate whether you get a mortgage or how much you pay for insurance. But sometimes they’re wrong – and sometimes they are designed to deceive” Cathy O’Neil, for Observer.
  • A talk about human factors research and what it says about data visualisation
  • “Point your phone at any mushroom and take a pic, our tech will instantly identify any mushrooms while giving you an article you can read or listen to.” This app seems to be intended as educational ‘augmented reality’, but one reason people want to identify mushrooms is to decide whether it’s safe to eat them. That’s not possible from just a photo, and the costs of some of the possible classification errors are very, very high.
  • A new trend in graphics: ‘joyplots’, named for the famous cover art of a Joy Division album. Here’s a history of the album cover, from Jen Christiansen. And now some examples:

 

Super 18 Predictions for the Semi-finals

Team Ratings for the Semi-finals

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Hurricanes 17.61 13.22 4.40
Crusaders 14.49 8.75 5.70
Lions 13.59 7.64 6.00
Highlanders 10.62 9.17 1.50
Chiefs 9.98 9.75 0.20
Brumbies 1.81 3.83 -2.00
Stormers 1.38 1.51 -0.10
Sharks 0.72 0.42 0.30
Blues -0.22 -1.07 0.90
Waratahs -3.81 5.81 -9.60
Bulls -4.96 0.29 -5.20
Jaguares -5.03 -4.36 -0.70
Force -6.97 -9.45 2.50
Cheetahs -9.63 -7.36 -2.30
Reds -9.92 -10.28 0.40
Kings -12.08 -19.02 6.90
Rebels -15.29 -8.17 -7.10
Sunwolves -19.38 -17.76 -1.60

 

Performance So Far

So far there have been 139 matches played, 105 of which were correctly predicted, a success rate of 75.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Brumbies vs. Hurricanes Jul 21 16 – 35 -10.80 TRUE
2 Crusaders vs. Highlanders Jul 22 17 – 0 6.10 TRUE
3 Lions vs. Sharks Jul 22 23 – 21 18.30 TRUE
4 Stormers vs. Chiefs Jul 22 11 – 17 -4.40 TRUE

 

Predictions for the Semi-finals

Here are the predictions for the Semi-finals. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Crusaders vs. Chiefs Jul 29 Crusaders 8.00
2 Lions vs. Hurricanes Jul 29 Hurricanes -0.00

 

NRL Predictions for Round 21

Team Ratings for Round 21

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Storm 7.79 8.49 -0.70
Cowboys 6.37 6.90 -0.50
Broncos 5.02 4.36 0.70
Sharks 3.29 5.84 -2.60
Panthers 2.96 6.08 -3.10
Raiders 1.82 9.94 -8.10
Roosters 1.68 -1.17 2.80
Sea Eagles 0.45 -2.98 3.40
Dragons -0.21 -7.74 7.50
Eels -0.58 -0.81 0.20
Titans -1.59 -0.98 -0.60
Rabbitohs -1.76 -1.82 0.10
Warriors -2.34 -6.02 3.70
Bulldogs -5.80 -1.34 -4.50
Wests Tigers -6.61 -3.89 -2.70
Knights -12.54 -16.94 4.40

 

Performance So Far

So far there have been 144 matches played, 88 of which were correctly predicted, a success rate of 61.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Bulldogs Jul 20 42 – 12 11.40 TRUE
2 Roosters vs. Knights Jul 21 28 – 4 16.50 TRUE
3 Sharks vs. Rabbitohs Jul 21 26 – 12 7.50 TRUE
4 Panthers vs. Titans Jul 22 24 – 16 8.10 TRUE
5 Raiders vs. Storm Jul 22 14 – 20 -1.70 TRUE
6 Cowboys vs. Warriors Jul 22 24 – 12 12.90 TRUE
7 Dragons vs. Sea Eagles Jul 23 52 – 22 -2.00 FALSE
8 Wests Tigers vs. Eels Jul 23 16 – 17 -2.90 TRUE

 

Predictions for Round 21

Here are the predictions for Round 21. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Warriors vs. Sharks Jul 23 Sharks -1.60
2 Knights vs. Dragons Jul 23 Dragons -8.80
3 Rabbitohs vs. Raiders Jul 23 Raiders -0.10
4 Roosters vs. Cowboys Jul 23 Cowboys -1.20
5 Storm vs. Sea Eagles Jul 23 Storm 10.80
6 Panthers vs. Bulldogs Jul 23 Panthers 12.30
7 Eels vs. Broncos Jul 23 Broncos -2.10
8 Titans vs. Wests Tigers Jul 23 Titans 8.50