Posts from September 2012 (71)

September 28, 2012

Coincidences

There’s a been a lot of coverage around the world of the Oksnes family in Norway: three of them have won significant sums in the lottery, all three at times close to when one family member, Hege Jeanette, was giving birth.

We’ve been asked what the probability was.  This isn’t even really a well-defined question, because it’s hard to say what would count as the same event.  Presumably a different Norwegian family would still count, and probably a Chilean family.  What if the family members won near their own 30th birthdays rather than near the time their child/niece/nephew was born? Or if they’d each won on the day they graduated college? If we can agree on what counts as the same event it’s then hard to work out the probability because we’d need data on number of lottery players all around the world, and on how many of them had children.

There are some things we can compute.  Suppose that there is one lottery prize a week in Norway, and that about 1 million of the country’s 4.9 million people play.  Divide them up into groups of six people.  The chance that three prizes end up in the same group of six people over ten years would be about 1.5 in ten thousand. Extending this to the whole world, it’s pretty likely that three people in the same family have won. That doesn’t cover the pregnancy, which restricts us to three periods of a few weeks in the ten years. Suppose we say that it’s three three-week periods that would give a close enough match.  The chance of all the wins lining up with the births would be about five in a million.  So, if we specified that the coincidence had to be about giving birth, it’s pretty unlikely.

Alternatively, we could ask how likely is it that a coincidence remarkable enough to get reported around the world would happen in a lottery.  The probability of that is pretty high, and we can tell, because lots of unlikely lottery coincidences do get reported.

Finally, we could ask what is the probability that the coincidence was really just due to chance. The answer to this one is easy. 100%.

Visualising health findings

The Cochrane Collaboration are holding their annual conference in Auckland starting on Sunday.  They are a decentralised, grassroots effort to collate and summarise all randomised clinical trials, to make sure that the information isn’t buried, but is available to clinicians and patients.  The online Cochrane Library of Systematic Reviews is available free to anyone in New Zealand, thanks to funding from the DHBs and the Ministry of Health.  As with many organisations, they award a variety of prizes in their field of work.  In contrast to many organizations, one of the prizes is awarded for the best criticism of the organization’s work.

Anyway, the conference is an excuse to link to a video by the Cambridge “Understanding Uncertainty” group.  They are working on animations to further improve the summaries of health findings from the Cochrane systematic reviews.

DIY statistics

From a Herald editorial

There is much intolerance of any use of this “ropey” information. A high priesthood of data analysis bemoans news media interest, however hedged with caveats, as betraying the apple in favour of the orange. Yet the combined “wisdom of the crowd” of thousands of schools and teachers, warts and all, does suggest, for example, fewer children meet standards in writing nationally than reading or mathematics.

I, like many people, was against the use of the data for league tables, though I thought it was probably inevitable.  But if the high priesthood of data analysis has issued any edicts on analysis of the data, they forgot to copy me on the email. Perhaps it’s because I wasn’t wearing the high priestly hat.

At StatsChat we’re in favour of more people doing DIY statistics, which is why we keep linking to data sources when newspapers don’t provide them.  As with any form of DIY, though, the results will be better if you have the right materials for the job at hand.

For any given set of data there are some questions that obviously can be answered (do fewer kids meet the writing standards?), and some that obviously can’t (are the writing standards just harder?).   There are also many questions where the results will be unclear because it’s not possible to reliably separate out the huge socioeconomic effects.  For example, it looks as though Maori children perform worse than non-minority children even within the same decile, but ‘within the same decile’ is a pretty broad range of schools, and the conclusion has to be pretty weak.

September 27, 2012

How is the beer up (down) here?

UBS economists have produced a nice graph showing how many minutes does it take to earn a beer. There is one fatal flaw however – it doesn’t have New Zealand! I thought we had better remedy this (maybe I am just avoiding something). The graph takes the average price for 500mL of beer, and divides it by the median hourly wage. The dollar figures, I assume, are all converted to US dollars so that everything is on the same scale. Statistics New Zealand helpfully provides us with the average hourly wage for 2011 – NZD20.38, and pricepint.com uses the power of crowd-sourcing to give us the average price of a pint in New Zealand at GBP2.36. Converting both of these figures in to US dollars gives us USD16.8241 and USD3.82047 respectively at today’s rates from xe.net This means on average it takes 13.62 minutes to earn a pint in New Zealand. There are no figures on the plot, but we seem to sit somewhere between Australia and Argentina, our fellow Rugby Championship competitors, but a long way below South Africa.

Something beginning with ‘A’?

The Herald website front page asks

New Zealand has been allocated 2000 of the 10,000 places available for the 2015 dawn service at Gallipoli – guess which country gets the rest?

The guess isn’t that hard, and the allocation seems pretty fair to NZ.  Australia gets 4 times as many places. It has 4.9 times as many people now, had 5.9 times as many serving in the campaign, and 3.7 2.2 times as many died.

September 26, 2012

Betting on a sure thing

Intrade is a company that hosts betting on political events, to “tap the wisdom of crowds”.  Matthew Yglesias posted the graph below, which shows the difference between the Intrade betting percentage on “Barack Obama wins the presidential election” and “The Democrats win the presidential election”.

The spikes above zero could theoretically just about be rational — if Obama dies before the election, the Democrats could win without Obama winning — although a probability of even 1% of this seems too high.  The spikes below zero imply that Obama wins without the Democrats winning, which really isn’t conceivable.  If you bet against the Democrats winning and in favor of Obama winning (or vice versa) at the right times you could make guaranteed free money.

The problem is that Intrade is small enough that it’s not worth people with lots of money hanging on it trying to exploit minor irrationalities, even for big events like the presidential election, especially when you take transaction costs into account. Betting markets tend not to leave $20 notes lying on the table, but they can drop the occasional handful of change.

 

Teaching statistics

I haven’t had the time or energy to do any analyses of the National Standards data, but other statistical bloggers haven’t had the same problem.

Luis Apiolaza has some dramatic graphics, such as this one showing the distribution of proportion achieving at or above the maths standard, by decile.  The top decile 1 school is below the median for deciles 9 and 10, and the upper quartile for decile 1 is about the same as the lower quartile for decile 4.

Eric Crampton has been doing regression modelling. He’s using Stata rather than R and has fewer pictures, because he’s an economist (but we like him anyway).

Eric comments on the strong decile differences, and notes that these make it hard to be confident about ethnic differences (schools with more Maori and Pacific students do worse, but on reading and perhaps on writing so do schools with more Asian students).  He also notes that there’s a lot of variation between schools that isn’t explained by the available socioeconomic data.  I’d be interested to know how much of this is random variation based on the limited number of students per school and how much is real variation that could be explained but isn’t.

Both Eric and Luis have put their data files and code where anyone else can easily get them, and they have left the data in much better shape than they found them, for the benefit of anyone else who might want to do some analysis.

NRL Predictions, Grand Final

Team Ratings for the Grand Final

Here are the team ratings prior to the Grand Final, along with the ratings at the start of the season. I have created a brief description of the method I use for predicting rugby games. Go to my Department home page to see this.

Current Rating Rating at Season Start Difference
Bulldogs 8.55 -1.86 10.40
Storm 7.63 4.63 3.00
Cowboys 6.35 -1.32 7.70
Sea Eagles 4.66 9.83 -5.20
Rabbitohs 4.48 0.04 4.40
Raiders 1.12 -8.40 9.50
Knights 0.01 0.77 -0.80
Dragons -0.37 4.36 -4.70
Broncos -0.98 5.57 -6.50
Sharks -2.05 -7.97 5.90
Titans -2.20 -11.80 9.60
Wests Tigers -2.74 4.52 -7.30
Roosters -5.43 0.25 -5.70
Panthers -6.45 -3.40 -3.00
Warriors -8.08 5.28 -13.40
Eels -8.25 -4.23 -4.00

 

Performance So Far

So far there have been 200 matches played, 123 of which were correctly predicted, a success rate of 61.5%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Storm vs. Sea Eagles Sep 21 40 – 12 3.56 TRUE
2 Bulldogs vs. Rabbitohs Sep 22 32 – 8 0.27 TRUE

 

Prediction for the Grand Final

Here is my prediction for the Grand Final.

Game Date Winner Prediction
1 Bulldogs vs. Storm Sep 30 Bulldogs 5.40

 

September 24, 2012

Stat of the Week Winner: September 15-21 2012

Thanks for your nominations last week in our Stat of the Week competition. We’ve selected Alan Keegan’s nomination of car theft data which doesn’t take into consideration the number of vehicles when considering the likelihood of them being stolen.

(Thanks Brent for noticing the date typo in the competition, which has now been corrected. We’d like to think that Stats Chat is not about pointing out others’ typos or mistakes, but intelligently discussing the statistics behind the news and the issues surrounding them.)

Data Science in Harvard Business Review

They say it’s the sexiest job of the 21st century.  Perhaps if people are calling statistics “data science” they will stop calling it “business intelligence” (via)