Posts from July 2014 (54)

July 31, 2014

Briefly

‘This is statistics’ website

The American Statistical Association is launching a public relations campaign to make people think statistics is less boring and pointless, which is good:

We want students and parents to have a better understanding of a field that is often unknown or misunderstood. Statistics is not just a collection of numbers or formulas. It’s not just lines, bars or points on a graph. It’s not just computing. Statistics is so much more. It’s an exciting—even fun—way of looking at the world and gaining insights through a scientific approach that rewards creative thinking.

That’s a quote from the  shiny new website, ThisIsStatistics. It has stories about what statisticians do, and information about salary and job trends and stuff.  There are videos of statisticians talking about their work: currently Roger Peng (Johns Hopkins, SimplyStatistics blog) and Genevra Allen (Rice University).

It’s slightly disappointing that more of the people on the site arent’ real, just stock photos, but I suppose that’s unavoidable. What’s a bit more annoying is one of the photos in particular:

About-ASA-cropped

This looks as if it was constructed specially (the cup/mat/tablet/glasses are stock, eg).  It’s a rose chart, which is an ok way to display circular data (eg wind directions), but is not so good for comparison because of the way the wedges change shape as they get larger. The numeric labels are also a slightly strange choice for a circle measured in degrees (90 isn’t a multiple of 20).

Much more importantly, given the emphasis of the site on statistics as solving real problems, this is labelled as not being real: “data A” and “data B”.  Not helpful when we’re trying to tell people “It’s not just lines, bars or points on a graph”.

 

July 30, 2014

NRL Predictions for Round 21

Team Ratings for Round 21

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Sea Eagles 9.18 9.10 0.10
Rabbitohs 7.48 5.82 1.70
Roosters 7.09 12.35 -5.30
Cowboys 4.89 6.01 -1.10
Warriors 4.38 -0.72 5.10
Storm 2.94 7.64 -4.70
Broncos 0.81 -4.69 5.50
Panthers -0.02 -2.48 2.50
Bulldogs -0.76 2.46 -3.20
Knights -1.30 5.23 -6.50
Dragons -2.44 -7.57 5.10
Titans -5.06 1.45 -6.50
Raiders -6.23 -8.99 2.80
Wests Tigers -6.81 -11.26 4.50
Sharks -7.26 2.32 -9.60
Eels -8.66 -18.45 9.80

 

Performance So Far

So far there have been 144 matches played, 79 of which were correctly predicted, a success rate of 54.9%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Knights vs. Roosters Jul 25 16 – 12 -5.80 FALSE
2 Broncos vs. Storm Jul 25 8 – 30 7.40 FALSE
3 Panthers vs. Sharks Jul 26 16 – 18 14.80 FALSE
4 Titans vs. Eels Jul 26 18 – 24 11.20 FALSE
5 Bulldogs vs. Cowboys Jul 26 12 – 20 0.50 FALSE
6 Warriors vs. Sea Eagles Jul 27 12 – 22 1.90 FALSE
7 Wests Tigers vs. Dragons Jul 27 12 – 28 3.60 FALSE
8 Raiders vs. Rabbitohs Jul 28 18 – 34 -7.60 TRUE

 

Predictions for Round 21

Here are the predictions for Round 21. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Broncos Aug 01 Sea Eagles 12.90
2 Bulldogs vs. Panthers Aug 01 Bulldogs 3.80
3 Sharks vs. Eels Aug 02 Sharks 5.90
4 Cowboys vs. Titans Aug 02 Cowboys 14.50
5 Roosters vs. Dragons Aug 02 Roosters 14.00
6 Raiders vs. Warriors Aug 03 Warriors -6.10
7 Rabbitohs vs. Knights Aug 03 Rabbitohs 13.30
8 Wests Tigers vs. Storm Aug 04 Storm -5.20

 

Super 15 Predictions for the Super Rugby Final

Team Ratings for the Super Rugby Final

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Waratahs 10.21 1.67 8.50
Crusaders 10.20 8.80 1.40
Sharks 3.91 4.57 -0.70
Hurricanes 2.89 -1.44 4.30
Bulls 2.88 4.87 -2.00
Chiefs 2.23 4.38 -2.10
Brumbies 2.20 4.12 -1.90
Stormers 1.68 4.38 -2.70
Blues 1.44 -1.92 3.40
Highlanders -2.54 -4.48 1.90
Lions -3.39 -6.93 3.50
Force -4.67 -5.37 0.70
Reds -4.98 0.58 -5.60
Cheetahs -5.55 0.12 -5.70
Rebels -9.53 -6.36 -3.20

 

Performance So Far

So far there have been 124 matches played, 82 of which were correctly predicted, a success rate of 66.1%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Crusaders vs. Sharks Jul 26 38 – 6 7.40 TRUE
2 Waratahs vs. Brumbies Jul 26 26 – 8 9.40 TRUE

 

Predictions for the Super Rugby Final

Here are the predictions for the Super Rugby Final. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Waratahs vs. Crusaders Aug 02 Waratahs 4.00

 

If you can explain anything, it proves nothing

An excellent piece from sports site Grantland (via Brendan Nyhan), on finding explanations for random noise and regression to the mean.

As a demonstration, they took ten baseball batters and ten pitchers who had apparently improved over the season so far, and searched the internet for news that would allow them to find an explanation.  They got pretty good explanations for all twenty.  Looking at past seasons, this sort of short-term improvement almost always turns out be random noise, despite the convincing stories.

Having a good explanation for a trend feels like convincing evidence the trend is real. It feels that way to statisticians as well, but it isn’t true.

It’s traditional at this point to come up with evolutionary psychology explanations for why people are so good at over-interpreting trends, but I hope the circularity of that approach is obvious.

July 29, 2014

H.G. Wells and statistical thinking

A treatment for unsubstantiated claims

A couple of months ago, I wrote about a One News story on ‘drinkable sunscreen’.

In New Zealand, it’s very easy to make complaints about ads that violate advertising standards, for example by making unsubstantiated therapeutic claims. Mark Hanna submitted a complaint about the NZ website of the company  selling the stuff.

The decision has been released: the complaint was upheld. Mark gives more description on his blog.

In many countries there is no feasible way for individuals to have this sort of impact. In the USA, for example, it’s almost impossible to do anything about misleading or unsubstantiated health claims, to the extent that summoning a celebrity to be humiliated publicly by a Senate panel may be the best option.

It can at least produce great television: John Oliver’s summary of the Dr Oz event is viciously hilarious

July 28, 2014

Rise of the machines

Journalism

Data

The Automatic Statistician project (somewhat flaky website) is working to automate various types of statistical modelling. They have interesting research papers. They also have a demo that’s fairly limited but produces linear regression models, model checks, and descriptions that are reasonable from a predictive point of view.

Automating some bits of data analysis is an important problem, because there aren’t enough statisticians to go around. However (as Cathy O’Neill points out about competition sites like Kaggle), they aren’t tackling the hard bits of data analysis: getting the data ready, and more importantly, getting the question into a precisely-specified form that can be answered by fitting a model.

The Games: How we’re doing

Statistics New Zealand is running the numbers during the Glasgow 2014 Commonwealth Games to show how many medals countries are winning relative to their population.  At the time of posting, we were third on a per-million-of-population basis. Check it out here.

Misleading maps

This map, from Reddit, shows the most common name in each county of England and Wales in 1881, based on the 1881 census.

jones

Matthew Yglesias at Vox.com  says what’s remarkable is how nearly perfectly the Smith/Jones divide lines up with the political boundary between England and Wales”.  I think it’s remarkable that he think’s it’s remarkable — I think of ‘Jones’ as the stereotypical Welsh name — but obviously associations are different in the US.  It is worth pointing out that the line-up isn’t as good as you might think if you weren’t careful: three of the light-green counties are actually in England, not in Wales. 

Yglesias also says that the names seem to show pretty distinctively what part of the British Isles your male line hails from.” That’s an example of how maps are systematically misleading — the conclusion may be true, but the map doesn’t support it as strongly as it seems to.  The map shows the most common name in each county, and most of the counties where Jones is the most common name are Welsh. However, that doesn’t mean most people called Jones were in Wales. In fact, based on search counts from UKCensusOnline.com, Lancashire had more Joneses than any Welsh county, and London had more than all but two Welsh counties. Overall, only 51% of Joneses were in Wales, going up to 60% if you include the three English counties coloured light green on the map.

In this particular case, many non-Welsh Joneses probably did have Welsh ancestors who had left Wales well before 1881, but not all of them — according to Wikipedia, the name came from Norman French and the first recorded use was in England.