Stats Chat

September 8, 2016

For small values of ‘infinitely’

By Thomas Lumley

Q: Did you see “women are infinitely more likely than men to suffer from virtually every form of headache there is”?

A: What would that even mean?

Q: Well, it says “statistically’, so you should know.

A: No.

Q: Could it be like ‘literally’ and it just means ‘a lot’. Like ‘five times as much’ or something?

A: I suppose, but it still isn’t true.

Q: What does the story say?

A: “a review of 24 worldwide studies published in the journal Headache in 2011 found that while more than half of women polled (52 per cent) reported having a problem with headaches at the time of the research, only 37 per cent of men did.”

Q: And is that right?

A: My English teachers would say they’re missing commas after “studies” and “2011”, but I might not have the high ground in that kind of debate.

Q: Restricting to statistical pedantry, is it right?

A: Well, the first step would be to find the paper, which they don’t make all that easy.

Q: You mean they don’t link?

A: That’s part of it, as Mark Hanna eloquently complained on Twitter. But they also don’t give the author names or the month.

Q: Did you find it?

A: Yes.

Q: And?

A: Here’s the relevant graph
crzs68jvuaaoufl

Q: Wait, what? More than 80% of men and women in North America have headaches right now? But only about 50% of people in total?

A: Looking up the reference that little ¹ points to, it seems ‘current’ headache really means ‘in the past three months’, or even ‘in the past year’, depending on the study.

Q: And the total being much less than either men or women separately?

A: ¯\_(ツ)_/¯ If I had to guess, maybe the studies that separated out men and women used a longer time period

Q: And is the time period why there are more headaches in North America than Europe?

A: Could be. Or the quality of the cheese. Or the fact that they’re in an election campaign, like, half the time.

Q: I’m sensing you don’t like this graph.

A: There are others
headache

Q: Right. 7.4% in women is less than 6.4% in men. ಠ_ಠ

A: Actually, the lines are ok, it’s just that the numbers are in the wrong places. If they’d written the numbers on the y-axis like we’ve been doing for centuries, they’d be ok.

Q: Ok. But we nearly digress. You’re saying ‘infinitely more” means something in the range 1.5 times to three or four times more?

A: And that some of the comparisons are a bit dodgy.

Q: It seems to have taken more work than necessary to establish that.

A: It sure has.

Q: Would you like to quote some of Mark Hanna’s tweets on linking?

A: “Listen up, reporters & editors. Every day I see you publish articles using “studies say” that make it hard or impossible to find the studies. You should be improving public understanding of science, but instead you are training your audience to believe in “studies say”.

Yes, do write about research. But let your readers read it too. Being able to criticise research is, erm, critical to scientific literacy. You are reinforcing the perception that science is opaque, impenetrable, and not for the eyes of laypeople. But that’s not true at all.

Worse than that, by training your readers to believe in “studies say” you are priming them to be fooled by pseudoscience.”

Q: Yes, that’ll preach.

Theory and data

By Thomas Lumley

From the Herald (from the Daily Telegraph)

A revolutionary blood test, which acts like a smoke detector to spot cancer up to 10 years before symptoms appear, could be available within five years.

It looks like this is genuinely impressive research, and deserves its spot at the British Science Festival, but it’s harder to assess the realism of the claims. What do we actually know now? Well, less than we should, because the claim is based on a press release and interviews about unpublished research. However, earlier research by the same group is available, with a bit of detective work.

In a conference abstract published in February, they report what they were trying to do: measure mutations in a specific gene in red blood cells. As the Herald story says:

Scientists at Swansea University have discovered that mutations occur in red blood cells way before any signs of cancer are evident.

But it’s more than that. Mutations in red blood cells occur before cancer even exists — another reason this test is potentially useful is for studying low levels of mutations that would have a very low chance of leading to cancer, so that the risk of realistic doses of potential carcinogens can be assessed. Since the test picks up mutations in the absence of cancer, there’s justification for worrying about false positives.

In the February abstract they had used the test on 121 people, and were claiming five-times-higher mutation rates in people with cancer than healthy people. Now they have 300 people and are claiming ten-times-higher rates — one possible explanation is that they’ve made the test more selective somehow and so are picking up fewer uninteresting mutations. In any case, progress. The earlier data didn’t look as if it could support a useful test; the new data might be able to.

We still don’t know about the false-positive rate — with 300 people tested, it’s too early to say. The false-positive rate is important for another reason, though. The Independent has another story, quoting the lead researcher

Professor Jenkins said they needed to find evidence that it would work for other cancers, but added it would be hard to imagine that it would not.

“It would be really difficult to think why it would only affect oesophageal cancer,” he said.

As he says, it’s hard to think why oesophageal cancer would be unique — though you might expect some cancers to be different. For example, in cervical cancer, the mutations are caused by a virus that only infects certain cell types, so it might not cause mutations that show up in red blood cells. But if we assume many cancers show the same pattern of red blood cell mutations, assessing the usefulness of the test gets more difficult. Suppose a positive result means you’re going to get some type of cancer over the next ten years, but it could be almost any type. What would the next step be?

There’s another important point in the first sentence of the Herald story. It contains two numbers. One is bigger than the other. As far as I can tell, this test is done on freshly-collected blood, and hasn’t been done on large numbers of healthy people yet. If the test is available within five years, it will, at best only come with reliable information for five years after testing.

View comments (5)

September 7, 2016

NRL Predictions for Finals Week 1

By David Scott

Team Ratings for Finals Week 1

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Raiders	10.54	-0.55	11.10
Storm	9.17	4.41	4.80
Cowboys	8.95	10.29	-1.30
Panthers	5.73	-3.06	8.80
Broncos	4.60	9.81	-5.20
Sharks	3.91	-1.06	5.00
Roosters	-0.08	11.20	-11.30
Bulldogs	-0.32	1.50	-1.80
Titans	-0.75	-8.39	7.60
Eels	-0.82	-4.62	3.80
Rabbitohs	-1.55	-1.20	-0.30
Sea Eagles	-2.83	0.36	-3.20
Wests Tigers	-4.05	-4.06	0.00
Warriors	-6.26	-7.47	1.20
Dragons	-7.44	-0.10	-7.30
Knights	-17.13	-5.41	-11.70

Performance So Far

So far there have been 192 matches played, 123 of which were correctly predicted, a success rate of 64.1%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Broncos vs. Roosters	Sep 01	24 – 14	7.20	TRUE
2	Bulldogs vs. Rabbitohs	Sep 02	10 – 28	7.70	FALSE
3	Dragons vs. Knights	Sep 03	28 – 26	14.40	TRUE
4	Cowboys vs. Titans	Sep 03	32 – 16	12.10	TRUE
5	Storm vs. Sharks	Sep 03	26 – 6	6.30	TRUE
6	Warriors vs. Eels	Sep 04	18 – 40	1.80	FALSE
7	Wests Tigers vs. Raiders	Sep 04	10 – 52	-6.90	TRUE
8	Panthers vs. Sea Eagles	Sep 04	36 – 6	8.60	TRUE

Predictions for Finals Week 1

Here are the predictions for Finals Week 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Broncos vs. Titans	Sep 09	Broncos	8.40
2	Raiders vs. Sharks	Sep 10	Raiders	9.60
3	Storm vs. Cowboys	Sep 10	Storm	3.20
4	Panthers vs. Bulldogs	Sep 11	Panthers	6.10

Mitre 10 Cup Predictions for Round 4

By David Scott

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Canterbury	16.55	12.85	3.70
Auckland	9.26	11.34	-2.10
Taranaki	9.09	8.25	0.80
Tasman	6.35	8.71	-2.40
Counties Manukau	3.78	2.45	1.30
Wellington	2.37	4.32	-2.00
Otago	1.95	0.54	1.40
Waikato	-2.91	-4.31	1.40
Hawke’s Bay	-3.34	1.85	-5.20
Bay of Plenty	-4.67	-5.54	0.90
North Harbour	-6.57	-8.15	1.60
Manawatu	-7.40	-6.71	-0.70
Southland	-12.61	-9.71	-2.90
Northland	-15.36	-19.37	4.00

Performance So Far

So far there have been 22 matches played, 19 of which were correctly predicted, a success rate of 86.4%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Otago vs. Northland	Aug 31	33 – 28	26.40	TRUE
2	Hawke’s Bay vs. Counties Manukau	Sep 01	20 – 48	2.30	FALSE
3	Southland vs. Auckland	Sep 02	16 – 51	-14.10	TRUE
4	Tasman vs. Taranaki	Sep 03	25 – 20	0.40	TRUE
5	Wellington vs. North Harbour	Sep 03	21 – 17	14.90	TRUE
6	Northland vs. Canterbury	Sep 03	34 – 52	-32.00	TRUE
7	Bay of Plenty vs. Otago	Sep 04	32 – 33	-4.90	TRUE
8	Waikato vs. Manawatu	Sep 04	19 – 10	8.40	TRUE

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Hawke’s Bay vs. Auckland	Sep 07	Auckland	-8.60
2	Taranaki vs. Southland	Sep 08	Taranaki	25.70
3	Bay of Plenty vs. Northland	Sep 09	Bay of Plenty	14.70
4	Counties Manukau vs. Wellington	Sep 09	Counties Manukau	5.40
5	North Harbour vs. Manawatu	Sep 10	North Harbour	4.80
6	Otago vs. Tasman	Sep 10	Tasman	-0.40
7	Canterbury vs. Hawke’s Bay	Sep 11	Canterbury	23.90
8	Auckland vs. Waikato	Sep 11	Auckland	16.20

View comments (2)

Currie Cup Predictions for Round 6

By David Scott

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Lions	9.55	9.69	-0.10
Western Province	5.25	6.46	-1.20
Blue Bulls	1.79	1.80	-0.00
Sharks	1.79	-0.60	2.40
Cheetahs	0.90	-3.42	4.30
Griquas	-11.01	-12.45	1.40
Pumas	-11.15	-8.62	-2.50
Cavaliers	-12.27	-10.00	-2.30
Kings	-16.30	-14.29	-2.00

Performance So Far

So far there have been 19 matches played, 12 of which were correctly predicted, a success rate of 63.2%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Cavaliers vs. Pumas	Sep 02	25 – 22	2.20	TRUE
2	Cheetahs vs. Kings	Sep 02	57 – 25	19.20	TRUE
3	Blue Bulls vs. Lions	Sep 03	31 – 17	-6.50	FALSE
4	Western Province vs. Sharks	Sep 03	34 – 27	7.00	TRUE

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Lions vs. Western Province	Sep 09	Lions	7.80
2	Griquas vs. Cavaliers	Sep 10	Griquas	4.80
3	Sharks vs. Cheetahs	Sep 10	Sharks	4.40
4	Pumas vs. Blue Bulls	Sep 10	Blue Bulls	-9.40

September 6, 2016

Briefly

By Thomas Lumley

Facebook’s step towards predictive models instead of people for its news editing didn’t work very well last week

Why more interesting local restaurants tend not to show up on internet maps.

A preview of Cathy O’Neil’s book about data science and its potential dangers, coming out tomorrow.

From Flowing Data, how the frequency of different household structures has changed in the US since 1970

A map of the world’s languages — showing the difficulties in definition, since all the Chinese languages are lumped in together when probably equally distinctive languages from different countries are given separately.

View comments (2)

September 5, 2016

Stat of the Week Competition: September 3 – 9 2016

By Rachel Cunliffe

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday September 9 2016.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of September 3 – 9 2016 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

September 2, 2016

A game changer?

By Thomas Lumley

There are stories on Stuff and the Herald about early studies of a potential Alzheimer’s drug. There was also a story on One News last night, but the video doesn’t seem to be up, and there’s one on Newshub.

The drug, adacanumab, reduced amyloid plaque buildup in people with early-stage disease. According to the most widely believed theory about Alzheimer’s, that could slow or even stop the progression of disease. And, as the stories say, if the treatment turns out to be successful in future trials, it will be a game changer.

We’ve never had an successful treatment that modifies the disease process in Alzheimer’s, but we’ve had a range of promising candidates that failed as soon the test went beyond biochemistry to improvements in memory or the ability to handle daily life. Adacanumab might be different. Let’s hope so.

September 1, 2016

Transport numbers

By Thomas Lumley

Auckland Transport released new patronage data, and FigureNZ tidied it up to make it easily computer-readable, so I thought I’d look at some of it. What I’m going to show is a decomposition of the data into overall trends, seasonal variation, and random stuff just happening. As usual, click to embiggen the pictures.

First, the trends: rides are up.

It’s hard to see the trend in ferry use, so here’s a version on a log scale — meaning that the same proportional trend would look the same for all three modes of transport

Train use is increasing (relatively) faster than bus or ferry use. There’s also an interesting bump in the middle that we’ll get back to.

Now, the seasonal patterns. Again, these are on a logarithmic scale, so they show relative variation

The clearest signal is that ferry use peaks in summer, when the other modes are at their minimum. Also, the Christmas minimum is a bit lower for trains: to see this, we can combine the two graphs:

It’s not surprising that train use falls by more: they turn the trains off for a lot of the holiday period.

Finally, what’s left when you subtract the seasonal and trend components:

The highest extra variation in both train and ferry rides was in September and October 2011: the Rugby World Cup.

View comments (1)

August 31, 2016

Be afraid

By Thomas Lumley

From the Herald (from the Daily Mail)

Patients should be warned about the dangers of chemotherapy after research showed that cancer drugs are killing up to 50 per cent of patients in some UK hospitals.

That’s almost completely untrue.

Firstly, the research looked at deaths from any cause within 30 days of starting treatment, and did not claim these were all due to chemotherapy. Secondly, the 50% figure was in one hospital. Thirdly, it was for a subset of one particular type of cancer. And, the conclusion from the news story cannot be found anywhere in the research paper.

The researchers do think that chemotherapy is probably being used suboptimally in some of the hospitals, including the one where about 5 out of 10 of the patients being treated ‘with curative intent’ for lung cancer died within 30 days. That hospital stood out, despite the tiny numbers, because the average death rate across all hospitals for similar patients was about 3%.

As the researchers say

The identification of hospitals with significantly higher 30-day mortality rates will promote review of clinical decision making in these hospitals.

It probably will, but that doesn’t tell us much about risks here on the other side of the world

Stats Chat

For small values of ‘infinitely’

Theory and data

NRL Predictions for Finals Week 1

Team Ratings for Finals Week 1

Performance So Far

Predictions for Finals Week 1

Mitre 10 Cup Predictions for Round 4

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

Currie Cup Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Briefly

Stat of the Week Competition: September 3 – 9 2016

A game changer?

Transport numbers

Be afraid

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Finals Week 1

Performance So Far

Predictions for Finals Week 1

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Recent comments

Popular posts

Latest posts

All topics

Recommended sites