September 8, 2016

For small values of ‘infinitely’

Q: Did you see “women are infinitely more likely than men to suffer from virtually every form of headache there is”?

A: What would that even mean?

Q: Well, it says “statistically’, so you should know.

A: No.

Q: Could it be like ‘literally’ and it just means ‘a lot’. Like ‘five times as much’ or something?

A: I suppose, but it still isn’t true.

Q: What does the story say?

A: “a review of 24 worldwide studies published in the journal Headache in 2011 found that while more than half of women polled (52 per cent) reported having a problem with headaches at the time of the research, only 37 per cent of men did.”

Q: And is that right?

A: My English teachers would say they’re missing commas after “studies” and “2011”, but I might not have the high ground in that kind of debate.

Q: Restricting to statistical pedantry, is it right?

A: Well, the first step would be to find the paper, which they don’t make all that easy.

Q: You mean they don’t link?

A: That’s part of it, as Mark Hanna eloquently complained on Twitter. But they also don’t give the author names or the month.

Q: Did you find it?

A: Yes.

Q: And?

A: Here’s the relevant graph
crzs68jvuaaoufl

Q: Wait, what? More than 80% of men and women in North America have headaches right now? But only about 50% of people in total?

A: Looking up the reference that little 1 points to, it seems ‘current’ headache really means ‘in the past three months’, or even ‘in the past year’, depending on the study.

Q: And the total being much less than either men or women separately?

A: ¯\_(ツ)_/¯   If I had to guess, maybe the studies that separated out men and women used a longer time period

Q: And is the time period why there are more headaches in North America than Europe?

A: Could be. Or the quality of the cheese. Or the fact that they’re in an election campaign, like, half the time.

Q: I’m sensing you don’t like this graph.

A: There are others
headache

Q:  Right. 7.4% in women is less than 6.4% in men. ಠ_ಠ

A: Actually, the lines are ok, it’s just that the numbers are in the wrong places. If they’d written the numbers on the y-axis like we’ve been doing for centuries, they’d be ok.

Q: Ok. But we nearly digress. You’re saying ‘infinitely more” means something in the range 1.5 times to three or four times more?

A: And that some of the comparisons are a bit dodgy.

Q: It seems to have taken more work than necessary to establish that.

A: It sure has.

Q: Would you like to quote some of Mark Hanna’s tweets on linking?

A: “Listen up, reporters & editors. Every day I see you publish articles using “studies say” that make it hard or impossible to find the studies. You should be improving public understanding of science, but instead you are training your audience to believe in “studies say”. 

Yes, do write about research. But let your readers read it too. Being able to criticise research is, erm, critical to scientific literacy. You are reinforcing the perception that science is opaque, impenetrable, and not for the eyes of laypeople. But that’s not true at all.

Worse than that, by training your readers to believe in “studies say” you are priming them to be fooled by pseudoscience.”

Q: Yes, that’ll preach.

Theory and data

From the Herald (from the Daily Telegraph)

A revolutionary blood test, which acts like a smoke detector to spot cancer up to 10 years before symptoms appear, could be available within five years.

It looks like this is genuinely impressive research, and deserves its spot at the British Science Festival, but it’s harder to assess the realism of the claims. What do we actually know now? Well, less than we should, because the claim is based on a press release and interviews about unpublished research. However, earlier research by the same group is available, with a bit of detective work.

In a conference abstract published in February, they report what they were trying to do: measure mutations in a specific gene in red blood cells. As the Herald story says:

Scientists at Swansea University have discovered that mutations occur in red blood cells way before any signs of cancer are evident.

But it’s more than that. Mutations in red blood cells occur before cancer even exists — another reason this test is potentially useful is for studying low levels of mutations that would have a very low chance of leading to cancer, so that the risk of realistic doses of potential carcinogens can be assessed. Since the test picks up mutations in the absence of cancer, there’s justification for worrying about false positives.

In the February abstract they had used the test on 121 people, and were claiming five-times-higher mutation rates in people with cancer than healthy people. Now they have 300 people and are claiming ten-times-higher rates — one possible explanation is that they’ve made the test more selective somehow and so are picking up fewer uninteresting mutations.  In any case, progress. The earlier data didn’t look as if it could support a useful test; the new data might be able to.

We still don’t know about the false-positive rate — with 300 people tested, it’s too early to say.  The false-positive rate is important for another reason, though.  The Independent has another story, quoting the lead researcher

Professor Jenkins said they needed to find evidence that it would work for other cancers, but added it would be hard to imagine that it would not.

“It would be really difficult to think why it would only affect oesophageal cancer,” he said.

As he says, it’s hard to think why oesophageal cancer would be unique — though you might expect some cancers to be different. For example, in cervical cancer, the mutations are caused by a virus that only infects certain cell types, so it might not cause mutations that show up in red blood cells.  But if we assume many cancers show the same pattern of red blood cell mutations, assessing the usefulness of the test gets more difficult. Suppose a positive result means you’re going to get some type of cancer over the next ten years, but it could be almost any type. What would the next step be?

There’s another important point in the first sentence of the Herald story. It contains two numbers. One is bigger than the other.  As far as I can tell, this test is done on freshly-collected blood, and hasn’t been done on large numbers of healthy people yet. If the test is available within five years, it will, at best only come with reliable information for five years after testing.

September 7, 2016

NRL Predictions for Finals Week 1

Team Ratings for Finals Week 1

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Raiders 10.54 -0.55 11.10
Storm 9.17 4.41 4.80
Cowboys 8.95 10.29 -1.30
Panthers 5.73 -3.06 8.80
Broncos 4.60 9.81 -5.20
Sharks 3.91 -1.06 5.00
Roosters -0.08 11.20 -11.30
Bulldogs -0.32 1.50 -1.80
Titans -0.75 -8.39 7.60
Eels -0.82 -4.62 3.80
Rabbitohs -1.55 -1.20 -0.30
Sea Eagles -2.83 0.36 -3.20
Wests Tigers -4.05 -4.06 0.00
Warriors -6.26 -7.47 1.20
Dragons -7.44 -0.10 -7.30
Knights -17.13 -5.41 -11.70

 

Performance So Far

So far there have been 192 matches played, 123 of which were correctly predicted, a success rate of 64.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Roosters Sep 01 24 – 14 7.20 TRUE
2 Bulldogs vs. Rabbitohs Sep 02 10 – 28 7.70 FALSE
3 Dragons vs. Knights Sep 03 28 – 26 14.40 TRUE
4 Cowboys vs. Titans Sep 03 32 – 16 12.10 TRUE
5 Storm vs. Sharks Sep 03 26 – 6 6.30 TRUE
6 Warriors vs. Eels Sep 04 18 – 40 1.80 FALSE
7 Wests Tigers vs. Raiders Sep 04 10 – 52 -6.90 TRUE
8 Panthers vs. Sea Eagles Sep 04 36 – 6 8.60 TRUE

 

Predictions for Finals Week 1

Here are the predictions for Finals Week 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Titans Sep 09 Broncos 8.40
2 Raiders vs. Sharks Sep 10 Raiders 9.60
3 Storm vs. Cowboys Sep 10 Storm 3.20
4 Panthers vs. Bulldogs Sep 11 Panthers 6.10

 

Mitre 10 Cup Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Canterbury 16.55 12.85 3.70
Auckland 9.26 11.34 -2.10
Taranaki 9.09 8.25 0.80
Tasman 6.35 8.71 -2.40
Counties Manukau 3.78 2.45 1.30
Wellington 2.37 4.32 -2.00
Otago 1.95 0.54 1.40
Waikato -2.91 -4.31 1.40
Hawke’s Bay -3.34 1.85 -5.20
Bay of Plenty -4.67 -5.54 0.90
North Harbour -6.57 -8.15 1.60
Manawatu -7.40 -6.71 -0.70
Southland -12.61 -9.71 -2.90
Northland -15.36 -19.37 4.00

 

Performance So Far

So far there have been 22 matches played, 19 of which were correctly predicted, a success rate of 86.4%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Otago vs. Northland Aug 31 33 – 28 26.40 TRUE
2 Hawke’s Bay vs. Counties Manukau Sep 01 20 – 48 2.30 FALSE
3 Southland vs. Auckland Sep 02 16 – 51 -14.10 TRUE
4 Tasman vs. Taranaki Sep 03 25 – 20 0.40 TRUE
5 Wellington vs. North Harbour Sep 03 21 – 17 14.90 TRUE
6 Northland vs. Canterbury Sep 03 34 – 52 -32.00 TRUE
7 Bay of Plenty vs. Otago Sep 04 32 – 33 -4.90 TRUE
8 Waikato vs. Manawatu Sep 04 19 – 10 8.40 TRUE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Hawke’s Bay vs. Auckland Sep 07 Auckland -8.60
2 Taranaki vs. Southland Sep 08 Taranaki 25.70
3 Bay of Plenty vs. Northland Sep 09 Bay of Plenty 14.70
4 Counties Manukau vs. Wellington Sep 09 Counties Manukau 5.40
5 North Harbour vs. Manawatu Sep 10 North Harbour 4.80
6 Otago vs. Tasman Sep 10 Tasman -0.40
7 Canterbury vs. Hawke’s Bay Sep 11 Canterbury 23.90
8 Auckland vs. Waikato Sep 11 Auckland 16.20

 

Currie Cup Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Lions 9.55 9.69 -0.10
Western Province 5.25 6.46 -1.20
Blue Bulls 1.79 1.80 -0.00
Sharks 1.79 -0.60 2.40
Cheetahs 0.90 -3.42 4.30
Griquas -11.01 -12.45 1.40
Pumas -11.15 -8.62 -2.50
Cavaliers -12.27 -10.00 -2.30
Kings -16.30 -14.29 -2.00

 

Performance So Far

So far there have been 19 matches played, 12 of which were correctly predicted, a success rate of 63.2%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Cavaliers vs. Pumas Sep 02 25 – 22 2.20 TRUE
2 Cheetahs vs. Kings Sep 02 57 – 25 19.20 TRUE
3 Blue Bulls vs. Lions Sep 03 31 – 17 -6.50 FALSE
4 Western Province vs. Sharks Sep 03 34 – 27 7.00 TRUE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Lions vs. Western Province Sep 09 Lions 7.80
2 Griquas vs. Cavaliers Sep 10 Griquas 4.80
3 Sharks vs. Cheetahs Sep 10 Sharks 4.40
4 Pumas vs. Blue Bulls Sep 10 Blue Bulls -9.40

 

September 6, 2016

Briefly

  • A preview of Cathy O’Neil’s book about data science and its potential dangers, coming out tomorrow.
  • A map of the world’s languages — showing the difficulties in definition, since all the Chinese languages are lumped in together when probably equally distinctive languages from different countries are given separately.
    languages
September 5, 2016

Stat of the Week Competition: September 3 – 9 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday September 9 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of September 3 – 9 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

September 2, 2016

A game changer?

There are stories on Stuff and the Herald about early studies of a potential Alzheimer’s drug. There was also a story on One News last night, but the video doesn’t seem to be up, and there’s one on Newshub.

The drug, adacanumab, reduced amyloid plaque buildup in people with early-stage disease. According to the most widely believed theory about Alzheimer’s, that could slow or even stop the progression of disease. And, as the stories say, if the treatment turns out to be successful in future trials, it will be a game changer.

tumblr_o26oyatIuG1sjxvs8o4_400

We’ve never had an successful treatment that modifies the disease process in Alzheimer’s, but we’ve had a range of promising candidates that failed as soon the test went beyond biochemistry to improvements in memory or the ability to handle daily life.  Adacanumab might be different. Let’s hope so.

September 1, 2016

Transport numbers

Auckland Transport released new patronage data, and FigureNZ tidied it up to make it easily computer-readable, so I thought I’d look at some of it.  What I’m going to show is a decomposition of the data into overall trends, seasonal variation, and random stuff just happening. As usual, click to embiggen the pictures.

First, the trends: rides are up.

trends

It’s hard to see the trend in ferry use, so here’s a version on a log scale — meaning that the same proportional trend would look the same for all three modes of transport

trendslog

Train use is increasing (relatively) faster than bus or ferry use.  There’s also an interesting bump in the middle that we’ll get back to.

Now, the seasonal patterns. Again, these are on a logarithmic scale, so they show relative variation

season

The clearest signal is that ferry use peaks in summer, when the other modes are at their minimum. Also, the Christmas minimum is a bit lower for trains: to see this, we can combine the two graphs:

season2

It’s not surprising that train use falls by more: they turn the trains off for a lot of the holiday period.

Finally, what’s left when you subtract the seasonal and trend components:

residual

The highest extra variation in both train and ferry rides was in September and October 2011: the Rugby World Cup.

 

August 31, 2016

Be afraid

From the Herald (from the Daily Mail)

Patients should be warned about the dangers of chemotherapy after research showed that cancer drugs are killing up to 50 per cent of patients in some UK hospitals.

That’s almost completely untrue.

Firstly, the research looked at deaths from any cause within 30 days of starting treatment, and did not claim these were all due to chemotherapy. Secondly,  the 50% figure was in one hospital. Thirdly, it was for a subset of one particular type of cancer.  And, the conclusion from the news story cannot be found anywhere in the research paper.

The researchers do think that chemotherapy is probably being used suboptimally in some of the hospitals, including the one where about 5 out of 10 of the patients being treated ‘with curative intent’ for lung cancer died within 30 days. That hospital stood out, despite the tiny numbers, because the average death rate across all hospitals for similar patients was about 3%.

As the researchers say

The identification of hospitals with significantly higher 30-day mortality rates will promote review of clinical decision making in these hospitals.

It probably will, but that doesn’t tell us much about risks here on the other side of the world