Stats Chat

August 22, 2016

Stat of the Week Competition: August 20 – 26 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday August 26 2016.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of August 20 – 26 2016 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

August 20, 2016

Briefly

By Thomas Lumley

Mining data from Lending Club. And Matt Levine’s comments: Here are 50 data points about this loan. Do what you want….. And if there’s no field for “does this person have another LendingClub loan,” and if that data point would have been helpful, well, sometimes that happens.

It’s just gone Saturday in the US, so it is no longer National Potato Day, and it won’t be National Spumoni Day until Sunday. Nathan Yau has a graphic of the 214 days that are National <some food> Day.

A flowing map of the migration patterns that will be needed by species in the Americas with global warming. The Appalachians are important.

Because genetic association studies are (or were) largely done in people of European ancestry, they can overpredict risks in everyone else. (NY Times). (The implication that this is also true of non-genetic research is, at least, exaggerated)

Social media and its effect on blinded randomised trials. As the story doesn’t say, some of the same things happened in the early years of HIV research.

By Sam Warburton on Twitter: the Olympic Rings as a Venn diagram

Why those stories about the number of deaths from medical errors are more complicated than they sound

The statistical significance filter

By Thomas Lumley

Attention conservation notice: long and nerdy, but does have pictures.

You may have noticed that I often say about newsy research studies that they are are barely statistically significant or they found only weak evidence, but that I don’t say that about large-scale clinical trials. This isn’t (just) personal prejudice. There are two good reasons why any given evidence threshold is more likely to be met in lower-quality research — and while I’ll be talking in terms of p-values here, getting rid of them doesn’t solve this problem (it might solve other problems). I’ll also be talking in terms of an effect being “real” or not, which is again an oversimplification but one that I don’t think affects the point I’m making. Think of a “real” effect as one big enough to write a news story about.

This graph shows possible results in statistical tests, for research where the effect of the thing you’re studying is real (orange) or not real (blue). The solid circles are results that pass your statistical evidence threshold, in the direction you wanted to see — they’re press-releasable as well as publishable.

Only about half the ‘statistically significant’ results are real; the rest are false positives.

I’ve assumed the proportion of “real” effects is about 10%. That makes sense in a lot of medical and psychological research — arguably, it’s too optimistic. I’ve also assumed the sample size is too small to reliably pick up plausible differences between blue and yellow — sadly, this is also realistic.

In the second graph, we’re looking at a setting where half the effects are real and half aren’t. Now, of the effects that pass the threshold, most are real. On the other hand, there’s a lot of real effects that get missed. This was the setting for a lot of clinical trials in the old days, when they were done in single hospitals or small groups.

The third case is relatively implausible hypotheses — 10% true — but well-designed studies. There are still the same number of false positives, but many more true positives. A better-designed study means that positive results are more likely to be correct.

Finally, the setting of well-conducted clinical trials intended to be definitive, the sort of studies done to get new drugs approved. About half the candidate treatments work as intended, and when they do, the results are likely to be positive. For a well-designed test such as this, statistical significance is a reasonable guide to whether the effect is real.

The problem is that the media only show a subset of the (exciting) solid circles, and typically don’t show the (boring) empty circles. So, what you see is

where the columns are 10% and 50% proportion of studies having a true effect, and the top and bottom rows are under-sized and well-design studies.

Knowing the threshold for evidence isn’t enough: the prior plausibility matters, and the ability of the study to demonstrate effects matters. Apparent effects seen in small or poorly-designed studies are less likely to be true.

View comments (6)

August 19, 2016

Has your life improved since 1966?

By Thomas Lumley

From Pew Research, is life better than 50 years ago for people like you?

The answers aren’t going to mean much about reality, more about the sort of people we are or want to think we are. As Fred Clark puts it

If you ask those of us who are 18-53 years old for our opinions about what life was like before we either existed or have any memory, we’ll give you an answer. And that speculative, possibly even informed, opinion may mean something or other in the aggregate. Maybe it tells us something fuzzy about general optimism or pessimism. Or maybe something about the dismal state of history, social studies, civics and science education.

Or, for the people who do have memories of the mid-sixties…

Age 65-70: I peaked in high school. Go away, nerd, or I’ll give you a swirlie.

August 18, 2016

Post-truth data maps

By Thomas Lumley

The Herald has a story “New map compares breast sizes around the world”. They blame news.com.au as the immediate cause, but a very similar story at the Daily Mail actually links to where it got the map. You might wonder how the data were collected (you might wonder why, too). The journalist did get as far as that:

The breast map doesn’t reveal how the cup sizes were measured, it’s fair to say tracking bra purchases per country would be an ideal – and maybe a little weird – approach.

Rigorously deidentified pie

By Thomas Lumley

Via Dale Warburton on Twitter, this graph comes from page 7 of the 2016 A-League Injury Report (PDF) produced by Professional Footballers Australia — the players’ association for the round-ball game. It seems to be a sensible and worthwhile document, except for this pie chart. They’ve replaced the club names with letters, presumably for confidentiality reasons. Which is fine. But the numbers written on the graph bear no obvious relationship to the sizes of the pie wedges.

It’s been a bad week for this sort of thing: a TV barchart that went viral this week had the same sort of problem.

August 17, 2016

Official statistics

By Thomas Lumley

There has been some controversy about changes to how unemployment is computed in the Household Labour Force Survey. As StatsNZ had explained, the changes would be back-dated to March 2007, to allow for comparisons. However, from Stuff earlier this week:

In a media release Robertson, Labour’s finance spokesman, said National was “actively massaging official unemployment statistics” by changing the measure for joblessness to exclude those using websites, such as Seek or TradeMe.

Robertson was referring to the Household Labour Force Survey, due to be released on Wednesday, which he says would “almost certainly show a decrease in unemployment” as a result of the Government “manipulating official data to suit its own needs”.

Mr Robertson has since withdrawn this claim, and is now saying

“I accept the Chief Statistician’s assurances on the reason for the change in criteria but New Zealanders need to be aware that National Ministers have a track record of misusing and misrepresenting statistics.”

That’s a reasonable position — and some of the examples have appeared on StatsChat — but I don’t think the stories in the media have made it clear how serious the original accusation was (even if perhaps unintentionally).

Official statistics such as the unemployment estimates are politically sensitive, and it’s obvious why governments would want to change them. Argentina, famously, did this to their inflation estimates. As a result, no-one believed Argentinian economic data, which gets expensive when you’re trying to borrow money. For that reason, sensible countries structure their official statistics agencies to minimise political influence, and maximise independence. New Zealand does have a first-world official statistics system — unlike many countries with similar economic resources — and it’s a valuable asset that can’t be taken for granted.

The system is set up so the Government shouldn’t have the ability to “actively massage” official unemployment statistics for minor political gain. If they did, well, ok, it was hyperbole when I said on Twitter ‘we’d need to go through StatsNZ with fire and the sword’, but the Government Statistician wouldn’t be the only one who’d need replacing.

View comments (8)

NRL Predictions for Round 24

By David Scott

Team Ratings for Round 24

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Storm	8.53	4.41	4.10
Raiders	7.90	-0.55	8.50
Cowboys	7.28	10.29	-3.00
Sharks	4.36	-1.06	5.40
Panthers	2.75	-3.06	5.80
Broncos	2.55	9.81	-7.30
Bulldogs	2.28	1.50	0.80
Sea Eagles	0.30	0.36	-0.10
Roosters	-0.46	11.20	-11.70
Titans	-1.04	-8.39	7.30
Wests Tigers	-1.18	-4.06	2.90
Warriors	-2.59	-7.47	4.90
Eels	-3.24	-4.62	1.40
Dragons	-4.12	-0.10	-4.00
Rabbitohs	-5.01	-1.20	-3.80
Knights	-16.62	-5.41	-11.20

Performance So Far

So far there have been 168 matches played, 104 of which were correctly predicted, a success rate of 61.9%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Bulldogs vs. Sea Eagles	Aug 11	20 – 16	5.20	TRUE
2	Broncos vs. Eels	Aug 12	38 – 16	6.60	TRUE
3	Wests Tigers vs. Titans	Aug 13	18 – 19	3.60	FALSE
4	Warriors vs. Rabbitohs	Aug 13	22 – 41	10.40	FALSE
5	Dragons vs. Sharks	Aug 13	32 – 18	-8.60	FALSE
6	Knights vs. Panthers	Aug 14	6 – 42	-13.30	TRUE
7	Roosters vs. Cowboys	Aug 14	22 – 10	-7.40	FALSE
8	Raiders vs. Storm	Aug 15	22 – 8	0.50	TRUE

Predictions for Round 24

Here are the predictions for Round 24. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Broncos vs. Bulldogs	Aug 18	Broncos	3.30
2	Panthers vs. Wests Tigers	Aug 19	Panthers	6.90
3	Knights vs. Titans	Aug 20	Titans	-12.60
4	Sea Eagles vs. Storm	Aug 20	Storm	-5.20
5	Cowboys vs. Warriors	Aug 20	Cowboys	13.90
6	Raiders vs. Eels	Aug 21	Raiders	14.10
7	Roosters vs. Dragons	Aug 21	Roosters	6.70
8	Rabbitohs vs. Sharks	Aug 22	Sharks	-6.40

Mitre 10 Cup Predictions for Round 1

By David Scott

Team Ratings for Round 1

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start
Canterbury	12.85	12.85
Auckland	11.34	11.34
Tasman	8.71	8.71
Taranaki	8.25	8.25
Wellington	4.32	4.32
Counties Manukau	2.45	2.45
Hawke’s Bay	1.85	1.85
Otago	0.54	0.54
Waikato	-4.31	-4.31
Bay of Plenty	-5.54	-5.54
Manawatu	-6.71	-6.71
North Harbour	-8.15	-8.15
Southland	-9.71	-9.71
Northland	-19.37	-19.37

Predictions for Round 1

Here are the predictions for Round 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	North Harbour vs. Counties Manukau	Aug 18	Counties Manukau	-6.60
2	Northland vs. Manawatu	Aug 19	Manawatu	-8.70
3	Bay of Plenty vs. Taranaki	Aug 20	Taranaki	-9.80
4	Hawke’s Bay vs. Wellington	Aug 20	Hawke’s Bay	1.50
5	Canterbury vs. Auckland	Aug 20	Canterbury	5.50
6	Southland vs. Otago	Aug 21	Otago	-6.20
7	Tasman vs. Waikato	Aug 21	Tasman	17.00

View comments (2)

Currie Cup Predictions for Round 3

By David Scott

Team Ratings for Round 3

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Lions	10.77	9.69	1.10
Western Province	5.35	6.46	-1.10
Blue Bulls	1.61	1.80	-0.20
Sharks	0.70	-0.60	1.30
Cheetahs	-0.82	-3.42	2.60
Cavaliers	-10.28	-10.00	-0.30
Pumas	-10.40	-8.62	-1.80
Griquas	-13.05	-12.45	-0.60
Kings	-15.32	-14.29	-1.00

Performance So Far

So far there have been 7 matches played, 4 of which were correctly predicted, a success rate of 57%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Lions vs. Pumas	Aug 12	68 – 26	22.50	TRUE
2	Sharks vs. Griquas	Aug 12	46 – 24	16.10	TRUE
3	Kings vs. Cavaliers	Aug 13	10 – 28	0.50	FALSE
4	Cheetahs vs. Blue Bulls	Aug 13	43 – 20	-1.50	FALSE

Predictions for Round 3

Here are the predictions for Round 3. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Western Province vs. Cheetahs	Aug 19	Western Province	9.70
2	Griquas vs. Lions	Aug 19	Lions	-20.30
3	Blue Bulls vs. Kings	Aug 20	Blue Bulls	20.40
4	Cavaliers vs. Sharks	Aug 20	Sharks	-7.50

Stats Chat

Stat of the Week Competition: August 20 – 26 2016

Briefly

The statistical significance filter

Has your life improved since 1966?

Post-truth data maps

Rigorously deidentified pie

Official statistics

NRL Predictions for Round 24

Team Ratings for Round 24

Performance So Far

Predictions for Round 24

Mitre 10 Cup Predictions for Round 1

Team Ratings for Round 1

Predictions for Round 1

Currie Cup Predictions for Round 3

Team Ratings for Round 3

Performance So Far

Predictions for Round 3

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 24

Performance So Far

Predictions for Round 24

Team Ratings for Round 1

Predictions for Round 1

Team Ratings for Round 3

Performance So Far

Predictions for Round 3

Recent comments

Popular posts

Latest posts

All topics

Recommended sites