Posts from March 2015 (48)

March 14, 2015

Ok, but it matters in theory

Some discussion on Twitter about political polling and whether political journalists understood the numbers led to the question:

If you poll 500 people, and candidate 1 is on 35% and candidate 2 is on 30%, what is the chance candidate 2 is really ahead?

That’s the wrong question. Well, no, actually it’s the right question, but it is underdetermined.

The difficulty is related to the ‘base-rate‘ problem in testing for rare diseases: it’s easy to work out the probability of the data given the way the world is, but you want the probability the world is a certain way given the data. These aren’t the same.

If you want to know how much variability there is in a poll, the usual ‘maximum margin of error’ is helpful. In theory, over a fairly wide range of true support, one poll in 20 will be off by more than this, half being too high and half being too low. In theory it’s 3% for 1000 people, 4.5% for 500. For minor parties, I’ve got a table here. In practice, the variability in NZ polls is larger than in theoretically perfect polls, but we’ll ignore that here.

If you want to know about change between two polls, the margin of error is about 1.4 times higher. If you want to know about difference between two candidates, the computations are trickier. When you can ignore other candidates and undecided voters, the margin of error is about twice the standard value, because a vote added to one side must be taken away from the other side, and so counts twice.

When you can’t ignore other candidates, the question isn’t exactly answerable without more information, but Jonathan Marshall has a nice app with results for one set of assumptions. Approximately, instead of the margin of error for the difference being (2*square root (1/N)) as in the simple case, you replace the 1 by the sum of the two candidate estimates, so (2*square root (0.35+0.30)/N). The margin of error is about 7%. If the support for the two candidates were equal, there would be about a 9% chance of seeing candidate 1 ahead of candidate 2 by at least 5%.

All this, though, doesn’t get you an answer to the question as originally posed.

If you poll 500 people, and candidate 1 is on 35% and candidate 2 is on 30%, what is the chance candidate 2 is really ahead?

This depends on what you knew in advance. If you had been reasonably confident that candidate 1 was behind candidate 2 in support you would be justified in believing that candidate 1 had been lucky, and assigning a relatively high probability that candidate 2 is really ahead. If you’d thought it was basically impossible for candidate 2 to even be close to candidate 1, you probably need to sit down quietly and re-evaluate your beliefs and the evidence they were based on.

The question is obviously looking for an answer in the setting where you don’t know anything else. In the general case this turns out to be, depending on your philosophy, either difficult to agree on or intrinsically meaningless. In special cases, we may be able to agree.

for values within the margin of error, you had no strong belief that any value was more likely than any other
there aren’t values outside the margin of error that you thought were much more likely than those inside

we can roughly approximate your prior beliefs by a flat distribution, and your posterior beliefs by a Normal distribution with mean at the observed data value and with standard error equal to the margin of error.

In that case, the probability of candidate 2 being ahead is 9%, the same answer as the reverse question. You could make a case that this was a reasonable way to report the result, at least if there weren’t any other polls and if the model was explicitly or implicitly agreed. When there are other polls, though, this becomes a less convincing argument.

TL;DR: The probability Winston is behind given that he polls 5% higher isn’t conceptually the same as the probability that he polls 5% higher given that he is behind. But, if we pretend to be in exactly the right state of quasi-ignorance, they come out to be the same number, and it’s roughly 1 in 10.

View comments (2)

March 13, 2015

Clinical trial reporting still not happening

By Thomas Lumley

According to a paper in the New England Journal of Medicine, about 20% of industry-funded clinical trials registered in the United States failed to report their summary results with no legally acceptable reason for delay. That’s obviously not good enough, and this sort of thing is why people don’t like drug companies.

As the paper says

On the basis of this review, we estimated that during the 5-year period, approximately 79 to 80% of industry-funded trials reported summary results or had a legally acceptable reason for delay. In contrast, only 49 to 50% of NIH-funded trials and 42 to 45% of those funded by other government or academic institutions reported results or had legally acceptable reasons for delay.

Um. Yes. <coughs nervously> <shuffles feet>

via Derek Lowe

View comments (1)

Feel-good gene?

By Thomas Lumley

From Stuff

Suffering anxiety, is not a mark of character, but at least in part to do with the genetic lottery, he says.

“About 20 per cent of adult Americans have this mutation,” Professor Friedman says of those who produce more anandamide, whose name is taken from the Sanskrit word for bliss.

There’s good biological research behind this story, on how the gene works in both mice and people, but the impact is being oversold. The human data on anxiety in the paper look like

Combining this small difference with the claim that 20% of people in the US carry the variant, it would explain about 1% of the population variation in the anxiety questionnaire score. Probably less of the variation in having/not having clinically diagnosable anxiety.

The story continues

“Those who do [have this mutation] may also be less likely to become addicted to marijuana and, possibly, other drugs – presumably because they don’t need the calming effects that marijuana provides.”

The New York Times version mentioned a study of marijuana dependence, which found people with the low-anxiety mutation were less likely to be dependent. However, for other drugs the opposite has been found:

Here, we report a naturally occurring single nucleotide polymorphism in the human FAAH gene, 385A, that is strongly associated with street drug use and problem drug/alcohol use.

People with the mutant, A, version of the gene, the low-anxiety variant, were more likely to have drug problems. In fact, even the study that found (weak) evidence for lower rates of marijuana dependence found much stronger evidence of higher rates of sedative dependence.

Simple, binary, genetic explanations for complex human conditions are always tempting, but usually wrong.

View comments (5)

March 12, 2015

Briefly

By Thomas Lumley

There will be SCIENCE at the Auckland Festival on Saturday: Dr Michelle ‘Nanogirl’ Dickinson blowing things up, Dr Siouxsie Wiles (and artists, and you) lighting things up, and panel discussions.

‘In the 17th century, another genre of paintings emerged, showing public administrators holding their books open for all to see. More than 100 of these paintings were produced between 1600 and 1800. Transparency became a cultural ideal worthy of art.’ Jacob Soll writing in the Boston Globe about the financial data revolution of the 16th century.

“The next big milestone for the project is to get a judge to rule in favor of a tenant based on Heat Seek data. That would set a precedent that the courts see these devices as reliable and unbiased evidence.” New York, like many US cities, has temperature standards for apartments where the landlord controls the heating system. Heat Seek wants to provide independent data using internet-connected thermometers.

Does the popularity of party leaders affect voting? In the UK it seems the answer is “sometimes, a bit”. (via Alex Harrowell)

Election donation maps

By Thomas Lumley

There are probably some StatChat readers who don’t read the NZ Herald, so I’ll point out that I have a post on the data blog about election donations.

Variation and mean

By Thomas Lumley

A lot of statistical reporting focuses on means, or other summaries of where a distribution lies. Often, though, variation is important. Vox.com has a story about variation in costs of lab tests at California hospitals, based on a paper in BMJ Open. Vox says

The charge for a lipid panel ranged from $10 to $10,169. Hospital prices for a basic metabolic panel (which doctors use to measure the body’s metabolism) were $35 at one facility — and $7,303 at another

These are basically standard lab tests, so there’s no sane reason for this sort of huge variation. You’d expect some variation with volume of tests and with location, but nothing like what is seen.

What’s not clear is how much this is really just variation in how costs are attributed. A hospital needs a blood lab, which has a lot of fixed costs. Somehow these costs have to be spread over individual tests, but there’s no unique way to do this. It would be interesting to know if the labs with high charges for one test tend to have high charges for others, but the research paper doesn’t look at relationships between costs.

The Vox story also illustrates a point about reporting, with this graph

If you look carefully, there’s something strange about the graph. The brown box second from the right is ‘lipid panel’, and it goes up to a bit short of $600, not to $10169. Similarly, the ‘metabolic panel’, the right-most box, goes up to $1000 on the graph and $7303 in the story.

The graph is taken from the research paper. In the research paper it had a caption explaining that the ‘whiskers’ in the box plot go to the 5th and 95th percentiles (a non-standard but reasonable choice). This caption fell off on the way to Vox.com, and no-one seems to have noticed.

March 10, 2015

NRL Predictions for Round 2

By David Scott

Team Ratings for Round 2

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Rabbitohs	14.70	13.06	1.60
Roosters	10.95	9.09	1.90
Cowboys	7.66	9.52	-1.90
Storm	4.72	4.36	0.40
Panthers	3.65	3.69	-0.00
Broncos	2.40	4.03	-1.60
Warriors	2.40	3.07	-0.70
Knights	0.39	-0.28	0.70
Bulldogs	0.25	0.21	0.00
Sea Eagles	0.21	2.68	-2.50
Dragons	-2.10	-1.74	-0.40
Eels	-4.73	-7.19	2.50
Raiders	-6.84	-7.09	0.20
Titans	-8.84	-8.20	-0.60
Sharks	-11.01	-10.76	-0.30
Wests Tigers	-12.49	-13.13	0.60

Performance So Far

So far there have been 8 matches played, 5 of which were correctly predicted, a success rate of 62.5%.

Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Broncos vs. Rabbitohs	Mar 05	6 – 36	-6.00	TRUE
2	Eels vs. Sea Eagles	Mar 06	42 – 12	-6.90	FALSE
3	Cowboys vs. Roosters	Mar 07	4 – 28	3.40	FALSE
4	Knights vs. Warriors	Mar 07	24 – 14	0.70	TRUE
5	Titans vs. Wests Tigers	Mar 07	18 – 19	7.90	FALSE
6	Panthers vs. Bulldogs	Mar 08	24 – 18	6.50	TRUE
7	Sharks vs. Raiders	Mar 08	20 – 24	-0.70	TRUE
8	Dragons vs. Storm	Mar 09	4 – 12	-3.10	TRUE

Predictions for Round 2

Here are the predictions for Round 2. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Bulldogs vs. Eels	Mar 13	Bulldogs	8.00
2	Sharks vs. Broncos	Mar 13	Broncos	-10.40
3	Cowboys vs. Knights	Mar 14	Cowboys	10.30
4	Panthers vs. Titans	Mar 14	Panthers	15.50
5	Sea Eagles vs. Storm	Mar 14	Storm	-1.50
6	Rabbitohs vs. Roosters	Mar 15	Rabbitohs	6.70
7	Raiders vs. Warriors	Mar 15	Warriors	-5.20
8	Wests Tigers vs. Dragons	Mar 16	Dragons	-7.40

Super 15 Predictions for Round 5

By David Scott

Team Ratings for Round 5

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Waratahs	8.68	10.00	-1.30
Crusaders	7.07	10.42	-3.30
Hurricanes	5.52	2.89	2.60
Brumbies	3.75	2.20	1.50
Stormers	3.54	1.68	1.90
Chiefs	3.25	2.23	1.00
Bulls	2.81	2.88	-0.10
Sharks	2.06	3.91	-1.90
Blues	-0.33	1.44	-1.80
Highlanders	-1.70	-2.54	0.80
Lions	-3.57	-3.39	-0.20
Force	-5.08	-4.67	-0.40
Cheetahs	-5.29	-5.55	0.30
Reds	-6.43	-4.98	-1.40
Rebels	-7.29	-9.53	2.20

Performance So Far

So far there have been 27 matches played, 17 of which were correctly predicted, a success rate of 63%.

Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Chiefs vs. Highlanders	Mar 06	17 – 20	10.60	FALSE
2	Brumbies vs. Force	Mar 06	27 – 15	13.00	TRUE
3	Blues vs. Lions	Mar 07	10 – 13	9.30	FALSE
4	Reds vs. Waratahs	Mar 07	5 – 23	-10.10	TRUE
5	Cheetahs vs. Bulls	Mar 07	20 – 39	-2.10	TRUE
6	Stormers vs. Sharks	Mar 07	29 – 13	4.00	TRUE

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Hurricanes vs. Blues	Mar 13	Hurricanes	9.80
2	Force vs. Rebels	Mar 13	Force	6.20
3	Crusaders vs. Lions	Mar 14	Crusaders	15.10
4	Highlanders vs. Waratahs	Mar 14	Waratahs	-5.90
5	Reds vs. Brumbies	Mar 14	Brumbies	-6.20
6	Stormers vs. Chiefs	Mar 14	Stormers	4.80
7	Cheetahs vs. Sharks	Mar 14	Sharks	-3.30

March 9, 2015

Stat of the Week Competition: March 7 – 13 2015

By Rachel Cunliffe

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 13 2015.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of March 7 – 13 2015 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

View comments (1)