How close is the nearest road in New Zealand?
Gareth Robins has answered this question with a very beautiful visualization generated with a surprisingly compact piece of R code.
Check out the full size image and the coded here.
Gareth Robins has answered this question with a very beautiful visualization generated with a surprisingly compact piece of R code.
Check out the full size image and the coded here.
From Stuff
The road toll has moved into triple figures for 2014 following the deadliest April in four years.
Police are alarmed by the rising number of deaths, that are a setback after the progress in 2013 when 254 people died in crashes in the whole year – the lowest annual total since 1950.
So far this year 102 people have died on the roads, 15 more than at the same point in 2013, Assistant Commissioner Road Policing Dave Cliff said today.
The problem with this sort of story is how it omits the role of random variation — bad luck. The Police are well aware that driving mistakes usually do not lead to crashes, and that the ones which do are substantially a matter of luck, because that’s key to their distracted driver campaign. As I wrote recently, their figures on the risks from distracted driving are taken from a large US study which grouped together a small number of actual crashes with a lot of incidents of risky driving that had no real consequence.
The importance of bad luck in turning bad driving into disaster means that the road toll will vary a lot. The margin of error around a count of 102 is about +/- 20, so it’s not clear we’re seeing more than misfortune in the change. This is especially true because last year was the best on record, ever. We almost certainly had good luck last year, so the fact that it’s wearing off a bit doesn’t mean there has been a real change in driver behaviour.
It was a terrible, horrible, no good, very bad month on the roads, but some months are like that. Even in New Zealand.
As many of you will know, I’m on the working group developing the content of a unit standard in statistical concepts for the National Diploma in Applied Journalism, which journalists with basic qualifications pursue while on the job to extend their skills.
It’s now time for me to ask you to send New Zealand examples of well-written statistically-based stories and poorly-written statistically-based stories that we can use. BUT you need to be able to send me the source data and an explanation (however rough) of what’s wrong with the story and how it could be improved.
I think that teachers may have existing examples that might do, and I would love to see them. You can email me your juicy contributions to statschat@gmail.com.
Today, Wiki New Zealand came to our department to talk about their work. I mentioned them in late 2012 when they first went live, but they’ve developed a lot since then.
The aim of Wiki New Zealand is to have ALL THE DATAS about New Zealand in graphical form, so that people who aren’t necessarily happy with spreadsheets and SQL queries can browse the information. Their front page at the moment has data on cannabis use, greenhouse gas emissions, wine grape and olive plantings, autism, and smoking.
From Stuff: “New Zealand’s worst air is not where you think“. That’s not actually true. New Zealand’s worst air, according to the story, is pretty much where I thought it would be, in coastal Canterbury and Otago. However, if you search the Stuff website for the term “air pollution”, you get:
So if you expected Auckland to be the worst, you know who to blame.
This table is from a University of California alumni magazine
Jeff Leek argues at Simply Statistics that the big problem with Big Data is they, too, forgot statistics.
Two tweets in my time line this morning linked to this report about this research paper, saying “americans have stopped searching on forbidden words“
That’s a wild exaggeration, but what the research found was interesting. They looked at Google Trends search data for words and phrases that might be privacy-related in various ways: for example, searches that might be of interest to the US government security apparat or searchers that might be embarrassing if a friend knew about them.
In the US (but not in other countries) there was a small but definite change in searches at around the time of Edward Snowden’s NSA revelations. Search volume in general kept increasing, but searches on words that might be of interest to the government decreased slightly
The data suggest that some people in the US became concerned that the NSA might care about them, and given that there presumably aren’t enough terrorists in the US to explain the difference, that knowing about the NSA surveillance is having an effect on political behaviour of (a subset of) ordinary Americans.
There is a complication, though. A similar fall was seen in the other categories of privacy-sensitive data, so either the real answer is something different, or people are worried about the NSA seeing their searches for porn.
The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.
Current Rating | Rating at Season Start | Difference | |
---|---|---|---|
Crusaders | 8.08 | 8.80 | -0.70 |
Sharks | 5.49 | 4.57 | 0.90 |
Chiefs | 4.24 | 4.38 | -0.10 |
Brumbies | 4.08 | 4.12 | -0.00 |
Waratahs | 3.05 | 1.67 | 1.40 |
Bulls | 1.99 | 4.87 | -2.90 |
Hurricanes | 1.63 | -1.44 | 3.10 |
Blues | -0.10 | -1.92 | 1.80 |
Stormers | -0.21 | 4.38 | -4.60 |
Highlanders | -1.71 | -4.48 | 2.80 |
Force | -2.36 | -5.37 | 3.00 |
Reds | -2.86 | 0.58 | -3.40 |
Cheetahs | -2.90 | 0.12 | -3.00 |
Rebels | -4.74 | -6.36 | 1.60 |
Lions | -6.69 | -6.93 | 0.20 |
So far there have been 74 matches played, 48 of which were correctly predicted, a success rate of 64.9%.
Here are the predictions for last week’s games.
Game | Date | Score | Prediction | Correct | |
---|---|---|---|---|---|
1 | Blues vs. Reds | May 02 | 44 – 14 | 3.70 | TRUE |
2 | Rebels vs. Sharks | May 02 | 16 – 22 | -6.30 | TRUE |
3 | Crusaders vs. Brumbies | May 03 | 40 – 20 | 6.30 | TRUE |
4 | Chiefs vs. Lions | May 03 | 38 – 8 | 12.90 | TRUE |
5 | Waratahs vs. Hurricanes | May 03 | 39 – 30 | 4.80 | TRUE |
6 | Stormers vs. Highlanders | May 03 | 29 – 28 | 6.20 | TRUE |
7 | Bulls vs. Cheetahs | May 03 | 26 – 21 | 7.80 | TRUE |
Here are the predictions for Round 13. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.
Game | Date | Winner | Prediction | |
---|---|---|---|---|
1 | Chiefs vs. Blues | May 09 | Chiefs | 6.80 |
2 | Rebels vs. Hurricanes | May 09 | Hurricanes | -2.40 |
3 | Highlanders vs. Lions | May 10 | Highlanders | 9.00 |
4 | Brumbies vs. Sharks | May 10 | Brumbies | 2.60 |
5 | Cheetahs vs. Force | May 10 | Cheetahs | 3.50 |
6 | Bulls vs. Stormers | May 10 | Bulls | 4.70 |
7 | Reds vs. Crusaders | May 11 | Crusaders | -6.90 |
Wiki New Zealand, which has information on all sorts of things, has a graph showing animal use for research/testing/teaching in NZ over time. The data are from the annual report (PDF) of the National Animal Ethics Advisory Committee.
Here’s a slightly more detailed graph showing types of animals and who used them, over time.
It’s also important to remember that nearly all the livestock and domestic animals weren’t harmed significantly — research on things like different feed or stocking densities still counts. Most of the rodents and rabbits ended up dead, as did about a third of the fish.
The two big increases recently are commercial livestock (most of which are no worse off than they would be anyway as livestock) and fish at universities. The increase in fish is probably due at least in part to substitution of zebrafish for mice in some biological research.
No, I don’t know what the government departments did with 40000 birds in 2009. [Update: thanks to James Green in comments, I now do. I]
[Update: here’s the data in accessible form]