Posts from April 2013 (67)

April 23, 2013

When ‘self-selected’ isn’t bogus

Two opportunities for public comment that will expire soon, and where StatsChat readers might have something to say

  • Stats New Zealand wants to hear from people who use Census data.  They have a questionnaire on how you use the data, and how this might be affected if they change the Census in various ways. It’s open until Friday May 3
  • Public submissions on the new ‘legal highs’ bill close on Wednesday May 1.  The bill is here. You can make a submission here.  The Drug Foundation have a description and recommendations here

This sort of public comment is qualitative, rather than quantitative.  Neither the Select Committee nor Stats New Zealand is likely to count up the number of submissions taking a particular view and use this as a population estimate, because that would be silly.  What they should be aiming for is a qualitatively exhaustive sample, one that includes all the arguments for or against the bill, or all the different ways people use Census data.

April 22, 2013

Briefly

  • Roger Peng comparing the recent Excel Economics incident to an earlier case of scientific fraud in cancer bioinfomatics

One has to wonder if the academic system is working in this regard. In both cases, it took a minor, but personal failing, to bring down the entire edifice. But the protestations of reputable academics, challenging the research on the merits, were ignored. I’d say in both cases the original research conveniently said what people wanted to hear (debt slows growth, personalized gene signatures can predict response to chemotherapy), and so no amount of research would convince people to question the original findings.

Stat of the Week Winner: April 13 – 19 2013

Congratulations to Steve Black for his highly topical nomination for Stat of the Week:

Statistic: A bogus txt in poll on Campbell Life shows 22% support for the same sex marriage bill, 78% opposed.
Source: TV3 Campbell Live
Date: 17 April 2013

A bogus txt in poll on Campbell Life shows 22% support for the same sex marriage bill, 78% opposed. Nice illustration of just how far off bogus polls can be from general population results done by proper sampling techniques. Ironically, it came on the same day as the legislation passed 77 to 44. I can’t find a trace of it left on the TV3 web page, but I’m still collecting references. Note that the conservative blog I referenced considers it “The best indication yet, short of a referendum, of what the public actually think about same-sex marriage.” thus compounding the mistake of bogus polls taken as meaningful.

This bogus poll would be a perfect example for our first year introductory statistics course on non-sampling errors causing biased results.

Special mention also to Nick’s nomination of Statistics New Zealand’s latest New Zealand Period Life Tables.

Stat of the Week Competition: April 20 – 26 2013

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 26 2013.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 20 – 26 2013 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: April 20 – 26 2013

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

April 19, 2013

Are Adam and Steve waiting out there?

Graeme Edgeler says on Twitter

I know many gay couples will want to marry quickly, but there *must* be a couple named Adam & Steve and we should totally let them go first.

Should we expect an Adam and Stephen couple? This is an opportunity to use public data and simple probability to get a rough estimate.

StatsNZ reported just over 5000 cohabiting male couples in 2006. That’s an underestimate of male couples, but probably an overestimate of those planning to marry soon.

I remembered seeing Project Steve, from the National Center for Science Education.  They collect signatures supporting the teaching of evolution from scientists named Stephen (after Stephen J. Gould) — they are currently up to 1268 — and make the point that under 1% of US males are named Stephen.

It turns out that they get this information from the US Census.  The most recent data is 1990 (and, of course, is US) so it’s not ideal, but it will give us a rough idea.  Stephen comes in at 0.54%, and when you add in Stephan, Esteban, Stefano, it still is no more than 0.6%.  Adam is 0.259%.

Under random assignment, then, there would be less than a 1 in 10 chance that there’s a couple called Adam and Steve living together in NZ, and even then they might well not be planning to get married.

 

 

[Update: Brendon correctly points out that I missed ‘Steven’, which is actually the most common variant. Apart from demonstrating that I’m an idiot, this doesn’t change the basic message.]

April 18, 2013

Briefly

chch

April 17, 2013

Visualising New York income inequality

From the New Yorker, a set of graphs showing how median household income varies along each subway line, based on the census tract containing each station.

Here’s the graph if you take the A-train:

atrain

(via @brettkeller)

 

Drawing the wrong conclusions

A few years ago, economists Carmen Reinhart and Kenneth Rogoff wrote a paper on national debt, where they found that there wasn’t much relationship to economic growth as long as debt was less than 90% of GDP, but that above this level economic growth was lower.  The paper was widely cited as support for economic strategies of `austerity’.

Some economists at the University of Massachusetts attempted to repeat their analysis, and didn’t get the same result.  Reinhart and Rogoff sent them the data and spreadsheets they had used, and it turns out that the analysis they had done didn’t quite match the description in the paper.  Part of the discrepancy was an error in an Excel formula that accidentally excluded a bunch of countries, but Reinhart and Rogoff also deliberately excluded some countries and times that had high growth and high debt (including Australia and NZ immediately post-WWII), and gave each country the same weight in the analysis regardless of the number of years of data included. (paper — currently slow to load, summary by Mike Konczal)

Some points:

  • The ease of making this sort of error in Excel is exactly why a lot of statisticians don’t like Excel (despite its other virtues), so that has received a lot of publicity.
  • Reinhart and Rogoff point out that they only claimed to find an association, not a causal relationship, but they certainly knew how the paper was being used, and if they didn’t think provided evidence of a causal relationship they should have said something a lot earlier. (I think Dan Davies on Twitter put it best)
  • Chris Clary, who is a PhD student at MIT, points out that the first author (Thomas Herndon) on the paper demonstrating the failure to replicate is also a grad student, and notes that replicating things is job often left to grad students.
  • The Reinhart and Rogoff paper wasn’t the primary motivation for, say,  the UK Conservative Party to want to cut taxes and government spending. The Conservatives have always wanted to cut taxes and government spending. Cutting taxes and spending is a significant part of their basic platform. The paper, at most, provided a bit of extra intellectual cover.
  • The fact that the researchers handed over their spreadsheet pretty much proves they weren’t deliberately deceptive — but it’s a lot easy to convince yourself to spend a lot of time checking all the details of a calculation when you don’t like the answer than when you do.

Roger Peng, at  Johns Hopkins, has also written about this incident. It would, in various ways, have been tactless for him to point out some relevant history, so I will.

The Johns Hopkins air pollution research group conducted the largest and most comprehensive study of health effects of particulate air pollution, looking at deaths and hospital admissions in the 90 largest US cities.  This was a significant part of the evidence used in setting new, stricter, air pollution standards — an important and politically sensitive topic, though a few order of magnitude less so than austerity economics.  One of Roger’s early jobs at Johns Hopkins was to set up a system that made it easy for anyone to download their data and reproduce or vary their analyses. The size of the data and the complexity of some of the analyses meant just emailing a spreadsheet to people was not even close to acceptable.

Their research group became obsessive (in a good way) about reproducibility long before other researchers in epidemiology.  One likely reason is a traumatic experience in 2002, when they realised that the default settings for the software they were using had led to incorrect results for a lot of their published air pollution time series analyses.  They reported the problem to the EPA and their sponsors, fixed the problem, and reran all the analyses in a couple of weeks; the qualitative conclusions fortunately did not change.  You could make all sorts of comparisons with the economists’ error, but that is left as an exercise for the reader.

 

Open data on the West Island

If you want to get Australian census summary data, you can download it from the Australian Bureau of Statistics, or buy a DVD for A$250.

An article in iTNews explains why someone might pay rather than downloading

“You have to click to download each pack individually, and they’ve set the site up deliberately to make it difficult to use a browser plugin to download everything that is contained on the released DVD image,” Bowland told iTNews.

That’s not hyperbole: Grahame Bowland quotes JavaScript code comments that actually say they are trying to make automatic downloading difficult.

Or, the data release is now available using bittorrent, thanks to Bowland, who bought the DVD (this is perfectly legit: the data are Creative Commons licenced).

(via @keith_ng)