Posts from March 2012 (64)

March 15, 2012

Prediction is hard, especially about buses?

The bus prediction system is potentially a very good idea — research in other cities has shown that it reduces actual waiting time, and reduces perceived waiting time even more.

Unfortunately, the illusory sense of control over one’s fate that the prediction system gives is easy to shatter.  This morning it was a stealth bus: arriving when it was allegedly still ten minutes away.   On other occasions the waiting time counts down steadily to DUE and then disappears, indicating that a ghost bus has gone by.

Bus prediction involves some hard engineering problems: the buses need to know where they are, they need to be able to tell the central office, and the central office needs to be able to get the information to transit users.  Fortunately, the first problem is solved by GPS (and odometers), the second by packet radio, and the third by the internet (and text-message gateways).

What remains is mostly a statistical problem, and partly a problem in applied psychology.    The data are fairly straightforward: the system knows approximately where the bus was every couple of minutes into the past, and this needs to be projected forward.   The Seattle MyBus project did a good job of implementing a simple version about ten years ago:  the prediction is a weighted average between where the bus would arrive at its current speed and when it should arrive by schedule, and the weights come from a large collection of actual bus trips.   There’s actually a lot more information available — for example, it’s hard to predict from a bus’s performance along Manukau Rd how long it will take to get through Newmarket, but since there’s a bus along Broadway every few minutes the system potentially has up-to-date information on congestion and crowding.

My guess, based on the relatively high frequency of buses that apparently go backwards, is that the Auckland system is a bit over-optimistic about how fast a late bus can return to schedule, and isn’t using the congestion information.  It also doesn’t seem to know which incoming bus will be running each route out of the city, so the city-center predictions can be a bit useless.  The real problem, though, is how to incorporate this uncertainty when presenting the results.   Even when you’re using all the available information there’s always prediction error, and sometimes a bus will stop transmitting information either for a few minutes or for its whole route.   In that case, the system has to fall back on the timetable, but the Auckland system doesn’t tell you it’s done that.

OneBusAway, in Seattle, distinguishes clearly between real predictions and timetable predictions. It also does helpful things like indicating which buses have just left, and it doesn’t seem to have ghost buses or stealth buses.  It’s also based on an open data stream that anyone can use — both the real-time bus location data and the predictions are accessible to anyone who wants to set up an improved system or just write a better app.

 

 

March 14, 2012

Big data in the papers

The New York Times has another article about the importance of `big data’ in the modern world:

GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.

and in the UK, the Graudian’s DataBlog is asking “What is a data scientist?”, motivated by a conference in Califoria on “Making Data Work“.

 

March 13, 2012

Stat of the Week Competition Discussion: March 10-16 2012

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

Current Nominations:

Cam Slater nominates Stuff’s article “Perceptions clash with facts over abuse”:

“Her masters thesis at Massey University found about half of the children killed in New Zealand died at the hands of a Pakeha abuser.”

“Maori make up 14.6% of the population but kill and abuse their kids at the same rates and everyone else. The split is about 50/50. Her research clearly shows that child abuse most certainly is a cultural issue with Maori hugely more likely than everyone else to kill or abuse their children.”

I comment at: http://www.whaleoil.co.nz/2012/03/yes-it-is-a-cultural-issue/

David Farrar also comments on this: http://www.kiwiblog.co.nz/2012/03/child_abuse_stats.html

The thesis says that the ethnicity of those convicted of assaulting children are Maori 48%, European 28%, PI 19%. To get a prevalence figure, I will use the population figures for under 14s. This is 21% Maori, 58% European, 11% PI and 9% Asian.

This works out to a prevalence rate for Maori that is 4.8 times that of Europeans. It is also 3.4 times that of Pacific Islanders. Or to compare all three, the comparative rates are Maori 4.8, PI 1.4, European 1.0.

March 12, 2012

Run along and play

The Herald (and others) are reporting the Milo State of Play survey, which says kids don’t play enough. In fact, the Herald says

The report outlines how a lack of play can deprive children of an activity crucial to healthy brain development.

Actually, it doesn’t. The report states that play is important to social development, and uses the phrase ‘prefrontal cortex’, but that’s all. Perfectly reasonable, since that’s not what the survey was about.  The survey (once you find the actual report than backs up the YouTube animation) used interviews of three samples: children, parents, grandparents to ask about activities and attitudes.  These were sampled in a nicely representative way, though from an online survey panel, so they may be more technologically aware and active than Kiwis as a whole.

According to the YouTube video, 96% of the participants agreed that active play was essential for development, and about half  (52%) of the parents, but less than half the children (40%)  or grandparents (31%), said children should have more playtime outside than they currently do.  That’s not really the impression that the story gives.

Some of the detailed findings are interesting:

More than half (56%) of parents think children enjoy playing games from their parents’ childhood. Where in fact, 96% of children state they do enjoy playing games from their parents’ youth.

and most children said they would like to spend more time playing with their parents.

The basic message of spending more time playing and having more unstructured time is one that has been promoted before, and seems sensible, but the report doesn’t really provide any more evidence for it than we had previously.

As a final note, the Herald story leads with

New Zealand children risk weight and brain development issues as a new study shows nearly half of Kiwi kids are not playing every day.

Is it just me that finds it ironic to get warnings like this from a survey sponsored by a multinational founded on selling chocolate and baby formula ?

Stat of the Week Winner: March 3-9 2012

This week’s winner was tough to decide – there were a fantastic selection of nominations. Thank you to all who entered!

The winner is Eric Crampton’s nomination of the Dominion Post’s burglary statistics:

“Wellingtonians were far less likely to be burgled than their Auckland counterparts, with 31 per cent of all burglaries taking place in the Auckland region, compared with just under 9 per cent in Wellington.”

Eric points to Bill Kaye-Blake’s thorough critique here.

Stat of the Week Competition: March 10-16 2012

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 16 2012.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of March 10-16 2012 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

March 9, 2012

NRL Predictions, Round 2

Team Ratings for Round 2

Here are the team ratings prior to Round 2, along with the ratings at the start of the season.
I have created a brief description of the method I use for predicting rugby games. Go to my Department home page
to see this.

Current Rating Rating at Season Start
Sea Eagles 10.30 9.83
Broncos 6.10 5.57
Warriors 4.80 5.28
Dragons 4.51 4.36
Storm 4.34 4.63
Wests Tigers 3.24 4.52
Roosters 0.92 0.25
Knights 0.62 0.77
Rabbitohs -0.62 0.04
Bulldogs -0.98 -1.86
Cowboys -3.96 -1.32
Panthers -4.28 -3.40
Eels -4.77 -4.23
Sharks -6.69 -7.97
Raiders -8.12 -8.40
Titans -9.16 -11.80

 

Performance So Far

So far there have been 8 matches played, 4 of which were correctly predicted, a success rate of 50%.

Here are the predictions for the games so far.

Game Date Score Prediction Correct
1 Knights vs. Dragons  Mar 01 14 – 15 0.91 FALSE
2 Eels vs. Broncos  Mar 02 6 – 18 -5.30 TRUE
3 Raiders vs. Storm  Mar 03 19 – 24 -8.53 TRUE
4 Panthers vs. Bulldogs  Mar 03 14 – 22 2.96 FALSE
5 Cowboys vs. Titans  Mar 03 0 – 18 14.98 FALSE
6 Warriors vs. Sea Eagles  Mar 04 20 – 26 -0.05 TRUE
7 Wests Tigers vs. Sharks  Mar 04 17 – 16 16.99 TRUE
8 Rabbitohs vs. Roosters  Mar 05 20 – 24 4.29 FALSE

 

Predictions for Round 2

Here are the predictions for Round 2.

Game Date Winner Prediction
1 Sea Eagles vs. Wests Tigers  Mar 09 Sea Eagles 11.60
2 Broncos vs. Cowboys  Mar 09 Broncos 14.60
3 Titans vs. Raiders  Mar 10 Titans 3.50
4 Bulldogs vs. Dragons  Mar 10 Dragons -1.00
5 Sharks vs. Knights  Mar 11 Knights -2.80
6 Roosters vs. Panthers  Mar 11 Roosters 9.70
7 Storm vs. Rabbitohs  Mar 11 Storm 9.50
8 Eels vs. Warriors  Mar 12 Warriors -5.10

 

Newsflash: Auckland is larger than Wellington

The denominator problem shows up yet again, this time in a press release from AA insurance, leading to stories in the Dominion Post, the Herald, the Aucklander, and probably others, and a Stat of the Week nomination for the Groping Towards Bethlehem blog (via Eric Crampton).

There are statistical problems in the press release, but the newspapers came up with additional bonus examples.

The press release says

Between 2009 and 2011 AA Insurance received the highest number of burglary and theft from vehicle claims from Auckland, Hamilton, Wellington, and Christchurch.

and the Dominion Post amplifies this to

But Wellingtonians were far less likely to be burgled than their Auckland counterparts, with 31 per cent of all burglaries taking place in the Auckland region, compared with just under 9 per cent in Wellington.

Auckland has three times as many burglaries as Wellington, which sounds bad until you consider that Auckland is larger than Wellington, by a factor of about, um, three.  Using population at the last census, the rate of burglaries per capita is still higher in Auckland, but by only 20%.   If we compare Auckland to the whole population of New Zealand, the burglary rate per capita is slightly lower in Auckland; and since the Wellington rate is lower, if we combine Auckland and Wellington the rate is also lower than for the rest of the country. This tends to cast doubt on the comment

AA Insurance head of operations Martin Fox said daytime robberies were more common in big cities, where most people did not head home for lunch.

This could be true  if night-time burglaries[I assume he means burglaries, not robberies] were much more common outside big cities, but we aren’t given any data to support this, and the data we do have argues against it.

So far this is mostly fluff, but the interesting bit of news is

Security alarms had proven effective for preventing burglaries, with 60 per cent of claims between 2009-11 coming from homes without alarm systems.

AA Insurance presumably know how many of their customers have security alarms, so they might have evidence for this claim. Perhaps only  30% of insured homes lack alarm systems, so the 60% of claims from such homes is notable.   We can’t tell, because they don’t explicitly give any comparisons of rates, they don’t give information that we could use to compute rates, and they sure haven’t given us any reason to trust them on the handling of denominators.

If we did have rates, there would still be a problem of causation vs correlation.  A Ministry of Justice survey in 2004 did find lower rates of burglary in houses with alarms, but they also found

The security measure most strongly associated with lowered rates of burglary was ‘telling neighbours when everyone will be away’. As only a small proportion of burglaries occurred while the occupants were away (Section 6.5.1), presumably this measure was an indicator for a more general relationship, such as a lowered risk of burglary when neighbours are known and when neighbours look out for one another.

Super 15 Predictions, Week 3

Team Ratings for Week 3

Here are the team ratings prior to Week 3, along with the ratings at the start of the season.
I have created a brief description of the method I use
for predicting rugby games. Go to my Department
home page
to see this.

Current Rating Rating at Season Start
Crusaders 9.22 10.46
Bulls 6.38 4.16
Stormers 6.03 6.59
Reds 4.90 5.03
Waratahs 4.35 4.98
Blues 1.78 2.87
Sharks 1.66 0.87
Chiefs -0.95 -1.17
Hurricanes -2.05 -1.90
Highlanders -3.57 -5.69
Force -4.08 -4.95
Cheetahs -4.45 -1.46
Brumbies -6.72 -6.66
Lions -10.12 -10.82
Rebels -15.68 -15.64

 

Performance So Far

So far there have been 14 matches played, 9 of which were correctly predicted, a success rate of 64.3%.

Here are the predictions for the games so far.

Game Date Score Prediction Correct
1 Blues vs. Crusaders Feb 24 18 – 19 -3.10 TRUE
2 Brumbies vs. Force Feb 24 19 – 17 2.80 TRUE
3 Bulls vs. Sharks Feb 24 18 – 13 7.80 TRUE
4 Chiefs vs. Highlanders Feb 25 19 – 23 9.00 FALSE
5 Waratahs vs. Reds Feb 25 21 – 25 4.40 FALSE
6 Stormers vs. Hurricanes Feb 25 39 – 26 13.00 TRUE
7 Lions vs. Cheetahs Feb 25 27 – 25 -4.90 FALSE
8 Chiefs vs. Blues Mar 02 29 – 14 -0.70 FALSE
9 Rebels vs. Waratahs Mar 02 19 – 35 -15.40 TRUE
10 Lions vs. Hurricanes Mar 02 28 – 30 -3.90 TRUE
11 Highlanders vs. Crusaders Mar 03 27 – 24 -10.40 FALSE
12 Reds vs. Force Mar 03 25 – 20 15.10 TRUE
13 Cheetahs vs. Bulls Mar 03 19 – 51 -1.40 TRUE
14 Stormers vs. Sharks Mar 03 15 – 12 10.00 TRUE

 

Predictions for Week 3

Here are the predictions for Week 3. The prediction is my estimated points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Crusaders vs. Chiefs Mar 09 Crusaders 14.70
2 Force vs. Hurricanes Mar 09 Force 2.50
3 Brumbies vs. Cheetahs Mar 10 Brumbies 2.20
4 Highlanders vs. Waratahs Mar 10 Waratahs -3.40
5 Reds vs. Rebels Mar 10 Reds 25.10
6 Sharks vs. Lions Mar 10 Sharks 16.30
7 Bulls vs. Blues Mar 10 Bulls 9.10


 

Current nominations for Stat of the Week – add yours before midday

Thanks for all the nominations on our Stat of the Week competition, there’s still a couple of hours to add yours in before this week’s competition closes.

Nominations to date which qualify are:

Online NZ Herald poll: Port workers’ sacking – who do you support?

Sammy Jia points out that the options are only “The sacked workers” and “Ports of Auckland”. There is no option for saying “Neither side” or “Don’t know”.

Women: good at budgets, bad savers

Ksenia Kovaleva rants:

“This is an example of very limited numbers/ statistics being used by the media to suggest something I find misogynistic. The actual survey and numbers are very shaky evidence to draw any conclusions, especially about Kiwi women. This stat is affected by selection bias, self-reporting bias and possibly even a social desirability bias element in that are men likely to admit to feeling insecure when not that long ago a man’s ‘role’ was thought to be to provide financial stability?)

I find this stat an example of irresponsible journalism to take what I would deem an unreliable and limited statistical result, rip it apart from any current sociological context, and pair it with a quote as if those numbers support that one person’s view when they don’t.

A badly designed survey is being published in a national newspaper pared with a quote perpetuating stereotypes and the sad thing is unless you have actually studied statistics, I think most people will simply take it at face value.” Read the full nomination »

Attractiveness of beards

Jordan Yates critiques an article in the NZ Herald on the attractiveness of beards.

Rubbish statistics

Manakaetau ‘Otai says statistics in a NZ Herald article:

“may mislead and confuse NZ Herald readers, but overall the story is a positive use of statistics but there is no real reputable source for the statistics to back up their claims”

Let us know what you think of the nominations!