Posts from October 2012 (66)

October 15, 2012

Think of a number, then add 50%

The Herald tells us:

More than 700 drivers have been nabbed for drug-driving since a new law came into effect.

Figures released under the Official Information Act show 575 motorists were charged with drug-driving from when new legislation was introduced on November 1, 2009 to July this year.

During the same period, another 134 motorists were charged under older legislation.

That’s a 20-month period, which, as usual, makes no particular sense.  We heard about  429 of the 575 motorists charged under the new law back in February.  If that was for the first year (which makes sense given the lag in the current figures), the rate is going down.  In fact, even if the 429 were through the end of January, which would be very fast data collection, the rate is still down, though not statistically significantly.

Stat of the Week Competition: October 13 – 19 2012

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday October 19 2012.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of October 13 – 19 2012 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: October 13 – 19 2012

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

October 14, 2012

One of the most important meals of the day

Stuff is reporting “Food and learning connection shot down”,based on a local study

Researchers at Auckland University’s School of Population Health studied 423 children at decile one to four schools in Auckland, Waikato and Wellington for the 2010 school year.

They were given a free daily breakfast – Weet-Bix, bread with honey, jam or Marmite, and Milo – by either the Red Cross or a private sector provider.

My first reaction on reading this was: why didn’t they take this opportunity to do a randomised trial, so we could actually get reliable data.  So I went to the Cochrane Library to see what randomised trials had been done in the past. These have mostly been in developing countries and have found improvements in growth, but smaller differences in school performance.

Then I tried asking the Google, and its second link was a paper by Dr Ni Mhurchu, the researcher mentioned in the story, detailing the plans for a randomised trial of school breakfasts in Auckland.  At that point it was easy to find the results, and see that in fact Stuff is talking about a randomized trial. They just didn’t think it was important enough to mention that detail.

To the extent that one can trust the Stuff story at this point, there seem to be three reactions:

  • I don’t believe it because my opinions are more reliable than this research
  • Lunch would work even if breakfast didn’t
  •  We should be making sure kids have breakfast even if it doesn’t improve school performance.

The latter two responses are perfectly reasonable positions to take (though they’re more convincing where they were taken before the results came out).  School lunches might be more effective than breakfasts, and the US (hardly a hotbed of socialism) has had a huge school nutrition program for 60 years.

Still, if we’re going to supply subsidised meals to school kids, we do need to know why we’re doing it and what we expect to gain.    This study is one of the first to go beyond just saying that the benefits are obvious.

 

October 12, 2012

Even better than chocolate

You can do even better than chocolate consumption in finding correlations with Nobel Prizes per capita.  With a few minutes on the Wikipedia entry used by Franz Messerli, I came up with a correlation of 0.921, much better than his 0.721. Here’s the graph (without lots of little flags, sorry)

 

The number of letters in the country’s name divided by total population is a much better predictor than the total chocolate consumption divided by total population.  Admittedly, changing the name of a country is usually more expensive than just eating more chocolate.

Turning a number into a rate or proportion helps for doing simple comparisons (this has a tag of its own on StatsChat) but simple ratio-based standardisation of two variables can create strong spurious correlations between them, something that medical researchers should be aware of.

 

There’s nothing like a good joke.

Q:  Have you started eating more chocolate yet?

A: I assume this is about the New England Journal paper.

Q: Of course.  You could increase your chance of a Nobel Prize

A: There are several excellent reasons why I am not going to get a Nobel Prize, but in any case I don’t have to eat the chocolate: anyone in Australia or New Zealand would do just as well. You can have my share.

Q:  What do you mean?

A: The article didn’t look at chocolate consumption by Nobel Prize winners, it looked at chocolate consumption in countries named in the official biographical information about Nobel Prize winners.  This typically includes where they were born and where they worked when they did the prize-winning research, and in some cases yet another country where they currently work.

Q: Does the article admit this?

A: In part.  The author admits that this is just per-capita data, not individual data.  Because he just got the Nobel Prize data from Wikipedia, rather than from the primary source, he doesn’t seem to have noticed that multiple countries per recipient are counted.

Q: Would the New England Journal of Medicine usually accept Wikipedia as a data source when the primary data are easily available?

A: No.

Q: What about the chocolate data?

A: The author doesn’t say whether the chocolate consumption measures weight as consumed (ie, including milk and sugar) or weight of actual chocolate content. That’s especially sloppy since he goes on and on about flavanols. Also, the Nobel Prize data is for 1901-2011 and the chocolate data is mostly just from 2010 or 2011: chocolate consumption in many countries has changed over the past century.

Q: Do you want to say something about correlation and causation now?

A: No, that’s what you say when you don’t know what causes spurious correlations.

Q: So what did cause this correlation?

A: There are at least two likely contributions.  The first is just that wealthy countries tend to have more chocolate consumption and more Nobel Prizes.  Chocolate and research are expensive.  The second is more interesting: it’s the same reason that storks per capita and birth rates are correlated.

Q: Storks bring chocolate as well as babies?

A: Not quite.  Birth rates and storks per capita tend to be correlated because they are both multiples of the reciprocal of population size.   Jerzy Neyman pointed this out in the prehistory of statistics, and Richard Kronmal brought it up again in 1993.  More recently, someone has done the computation with real data (p=0.008). Imperfect standardisation will induce correlation, and since Nobel Prizes almost certainly don’t depend linearly on population, the correction is bound to be imperfect.

Q: Why did the New England Journal publish this article?

A: It wasn’t published as a research article; it was in their ‘Occasional Notes’ series, which the journal describes as “accounts of personal experiences or descriptions of material from outside the usual areas of medical research and analysis.”

Q: Isn’t it good that stuffy medical journals do this sort of thing occasionally? There’s nothing like a good joke

A: Well, you might hope they would do it better, like the BMJ does.  This is nothing like a good joke.

 

October 11, 2012

Make your own data maps

Indiemapper is a web-based tool for creating maps for data visualisation, based on your data or their built-in files. Here’s an example, showing cumulative inflation 2001-2008

It knows about projections, sensible colour schemes, and ways of representing information on maps.  It’s a bit slow, since it has to run in your browser, but it’s well worth trying

October 10, 2012

Classification problems

I was interested in how the new psychoactive substances laws were going to handle the problem of, on one hand, the unsafe legal highs that they don’t want to ban, and on the other hand, the potentially psychoactive substances that they don’t want to have to regulate.  Safety testing for new medications is a complicated scientific and statistical problem and hard to get right even when you aren’t trying to gerrymander it.

The regulatory impact statement says they are just going to do all this by fiat. Alcohol, tobacco, caffeine (and presumably kava) will be exempted so they can be handled by existing law; currently-banned drugs will still be banned;  and things like nutmeg and a range of ornamental plants will be classified by fiat as not psychoactive if anyone raises the issue.  In the case of any ambiguity, the regulator will get to just decide. I suppose that’s the only practical way to do it, given the goals.

The headlines so far have been about the cost of approval, which is about twice what MEDSAFE charges for new medications. That’s  not unreasonable considering that legal highs are likely to be less chemically and biologically familiar than most medications.  However, the costs are basically irrelevant unless the safety criteria are written loosely enough that some psychoactive compound could conceivably pass them.  Since the criteria don’t have to be consistent with any of the rest of drug and food laws, and it’s unlikely that anyone will come up with the testing budget, there’s no upside to making them realistic.

It will still be interesting to see how the criteria end up being written, and whether caffeine, nutmeg, (or, in the other direction, some cannabis preparation) would be able to pass them. Obviously alcohol and tobacco wouldn’t.

Nutrition polls

The clicky poll accompanying the story about nutrition in the younger generation now looks like this:

There are slightly dubious surveys, like the Weight Watchers one, and then there are bogus polls strictly for entertainment purposes only.

Ignorance surveys

In the previous post I was sceptical about the importance of young Kiwis not being on first-name terms with their zucchini.  A post by Mark Liberman on Language Log suggests that ignorance surveys are even worse than I realized.

There’s the well-known problem of misreporting:

A new survey conducted by Chicago’s McCormick Tribune Freedom Museum, which has yet to open, finds that only 28 percent of Americans are able to name one of the constitutional freedoms, yet 52 percent are able to name at least two Simpsons family members.

when in fact the figure in that survey was 73%, not 28%.  What’s new is how bad the coding of responses can be even in very respectable surveys

The way it works is that the survey designers craft a question like this one (asked at a time when William Rehnquist was the Chief Justice of the United States):

“Now we have a set of questions concerning various public figures. We want to see how much information about them gets out to the public from television, newspapers and the like….
What about William Rehnquist – What job or political office does he NOW hold?”

Answers scored as incorrect included:

Supreme Court justice. The main one.
He’s the senior judge on the Supreme Court.
He is the Supreme Court justice in charge.
He’s the head of the Supreme Court.
He’s top man in the Supreme Court.

Mark Liberman concludes

  • When you read or hear in the mass media that “Only X% of Americans know Y”, don’t believe it without checking the references — it’s probably false even as a report of the survey statistics.
  • When you read survey results claiming that “Only X% of Americans know Y”, don’t believe the claims unless the survey publishes (a) the exact questions asked; (b) the specific coding instructions used to score the answers; (c) a measure of inter-annotator agreement in blind tests; and (d) the raw response transcripts.

That might be going a bit far, but at least (a) and (b) are really important. If you call the elongated green vegetable a ‘courgette’, is that scored as right or wrong? What if you are from the US and call it ‘summer squash’, or from South Africa and (according to Wikipedia) call it ‘baby marrow’?