Posts from October 2015 (50)

October 12, 2015

Stat of the Week Competition Discussion: October 10 – 16 2015

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

October 11, 2015

With the potential to miss us completely

Q: Did you see there’s a giant rock with the potential to end life on Earth?

A: This one?

Q: Yes. Are they exaggerating?

A: Depends what you mean. In a sense it does have the potential to end human life on Earth, but it would have to actually hit Earth to do that.

Q: But it’s  “similar to the 1862 Apollo asteroid which was classified as a potentially hazardous object”

A: Similar except for being a lot further away. As the story says, “Potentially Hazardous Objects” approach closer than 7,402,982km, and this one is about 25 million km away at its closest.

Q: That’s an awfully precise number, 7,402,982, isn’t it? Why do they need it to the nearest kilometre?

A: They don’t. It’s 0.05 Astronomical Units, and whoever did the conversion doesn’t understand about significant digits. Wikipedia, for example, rounds it to 7.5 million km.

Q: And the other really precise numbers? It says the asteroid is moving at 64,374km/hr, but surely the speed will change more than 1km/hr because, you know, gravity and physics and stuff?

A: That’s 40,000 miles per hour. Again, looks like one significant digit in the original.

Q: So how far away is this asteroid compared to, say, the moon?

A: To one significant figure, 100 times further away.

Q: That’s quite a lot. Why is NASA making a fuss about this asteroid?

A: They aren’t. They issued a press release about asteroid rumours in  August, headlined “There is no asteroid threatening Earth“.  The NASA @asteroidwatch twitterwallah is getting a bit tetchy about the whole thing.

Q: Does the asteroid have something to do with the “blood moon” we had recently?

A: Only in the sense that they were both completely unsurprising and harmless astronomical events.

 

(h/t @philiplyth)

Gay gene update

Yesterday I wrote about a ‘gay epigenetics’ story in the Herald, and wasn’t convinced that there was anything worth publicising at this point, and that there wasn’t enough detail to interpret the results.

Ed Yong, a science journalist who was actually at the conference, has a story today in the Atlantic. He fingers the conference as the responsible party for the publicity (here’s their press release), though with the active cooperation of the researchers.

His story has more detail and makes it clear that there’s very little evidence, and more importantly that the lead researcher knew this:

“The reality is that we had basically no funding,” he said. “The sample size was not what we wanted. But do I hold out for some impossible ideal or do I work with what I have? I chose the latter.”

For pilot research presented to consenting scientists that might be reasonable, but for press releases it isn’t.

Epigenetics is an area of science where New Zealand has an international reputation. It would be a pity if it ended up as one of the areas where you can be sure that basically nothing that makes it to the newspapers is true.

October 10, 2015

Return of the brother of the gay gene

From the Herald (from the Telegraph)

Factors ranging from exposure to certain chemicals to childhood abuse, diet and exercise may affect the DNA controlling sexuality, according to research being presented at a US conference on genetics.

They believe they can predict with 70 per cent accuracy whether a man is gay or straight, simply by looking at those parts of the genome.

[There’s a slightly better story in Nature News.]

70% accuracy doesn’t seem all that impressive. Using the usual figures on the proportion of men who are gay, the approach of assuming everyone is straight unless you are told otherwise is better than 90% accurate, and doesn’t need expensive genetics.  Presumably they mean something different by 70% accuracy, but we don’t know what.

More importantly, this is research in identical twins.  If you take pairs of people who are genetically identical, had the same environment in the womb, and then very similar environments in infancy and childhood, you’ve stripped out nearly all the other factors that could affect sexual orientation. That’s the point of doing the research this way — you get a clearer view of potentially-small differences — but it’s a limitation when you’re trying to make claims about people in general.

Also, there’s an important difference between genetics and epigenetics here. The epigenetic markers, as the story says, can be affected by things that happen to you during childhood. But that means we can’t necessarily assume the correlations between epigenetic differences and sexual orientation are causal.  The “factors ranging from exposure to certain chemicals to childhood abuse, diet and exercise” that can affect epigenetic markers could also affect sexual orientation directly — especially since the epigenetic markers were measured in cells from the lining of the mouth, not in, say, the brain.

On top of all that, this is another annoying example of research being publicised before it’s published. It’s not at all impossible that the claims are true,  but there isn’t enough public information to tell. The research was presented at the conference of the American Society for Human Genetics. People at the conference would have been able to see more detail, and maybe ask questions. We can’t. We won’t be able to until there’s a published research paper. That would have been the time for publicity.

And finally, there’s an interesting assumption revealed in the headline “Boys ‘turned gay by childhood shift in genes’“. The research looked at differences between identical twins. It says absolutely nothing about which twin changed and which one stayed the same — you could equally well say “Boys turned straight by childhood shift in genes”.

 

Predicting abortion attitudes

Quartz has an interesting analysis of a recent Twitter storm over abortion, triggered by the US Republicans’ attempts to defund Planned Parenthood.  The headline is striking “How to tell whether a Twitter user is pro-choice or pro-life without reading any of their tweets.”

The writers describe how they could use words in twitter profiles to predict people’s attitudes.  They also found that social network structure was a very strong predictor: people shared the views of those they followed.  They write “so polarized is the social network structure that even very basic, obvious characteristics stop mattering if we know who your friends are”

It might seem strange that you could do so well in predicting attitudes across multiple countries on a controversial topic. It would be strange, except that the data they used was restricted to a small group of people who were participating in a Twitter argument about abortion. The story admits this, but not until near the end.

In real life, you probably can’t learn that much about someone’s views on abortion by whether they tweet about cats or football. In the context of a small, highly polarised argument, you probably can.  In real life, people don’t necessarily agree with the views of the people they follow on Twitter, but in that context it’s not surprising that they do.  And in real life, if someone wants to find out your views on a controversial topic they’d probably be better off asking you than tracking down all your friends and asking them.

 

October 9, 2015

Predictive analytics and the rise of the machines

Some cautionary tales

  • “I would like to challenge this picture, and ask you to imagine data not as a pristine resource, but as a waste product, a bunch of radioactive, toxic sludge that we don’t know how to handle.” A talk by Maciej Ceglowski
  • How do you measure whether automated decision making ends up discriminating by race, when it doesn’t explicitly use race as an input? Two posts by Cathy O’Neil
  • A computer program that was accidentally trained to discriminate by gender and ethnicity
  • Why modern predictive analytics doesn’t give ‘algorithms’ in the sense of ‘recipes’, by Suresh Venkat (via @ndiakopoulos)

Briefly

  • A 2010 post complaining about a continuing problem: when the media report on scientific papers that the journals haven’t yet made available to scientists.
  • Which bar is closest to a whole number in length?xl
    That’s right, the smallest one is exactly 1.0 and the others are all slightly larger than a whole number. Inspired by one of Kieran Healy’s examples
  • Linguistic statistics: G K Chesterton almost never used feminine pronouns in his novels.
  • The famous London Underground map, labelled with rents in the neighbourhood of each station. Would be interesting to see an Auckland map using trains and major bus routes. (via Flowing Data)
October 8, 2015

He’s a lumberjack and he’s inconsistently counted

Official statistics agencies publish lots of useful information that gets used by researchers, by educators, by businesses, by journalists, and (with the help of groups like Figure.NZ) by everyone else.  A dilemma for these agencies is how to handle changes in the best ways to measure something. If you never change the definitions you get perfectly consistent reports of no-longer-useful information. If you do change the definitions, things don’t match up.

This graph is from a blog post by a Canadian economist, Liveo Di Matteo. It shows the number of Canadians employed in the lumber industry over time, patched together from several Statistics Canada time series.

6a00d83451688169e201b8d155a38a970c

Dr Di Matteo is a professional, and wasn’t trying to do anything subtle here — he just wanted a lecture slide — and a lot of this data was from the time when Stats Canada was among the best in the world, so it’s not a problem that’s easy to avoid. It’s just harder than it sounds to define who works in the lumber industry. For example, are the log drivers in the lumber industry, or are they something like “transport workers, not elsewhere classified”?

 

Rugby World Cup Predictions, 09 October 2015 to 11 October 2015

Team Ratings at 09 October

The basic method is described on my Department home page.

Here are the team ratings prior to 09 October along with the ratings at the start of the Rugby World Cup.

Rating at 09 October Rating at RWC Start Difference
New Zealand 26.72 29.01 -2.30
South Africa 23.39 22.73 0.70
Australia 21.59 20.36 1.20
Ireland 16.82 17.48 -0.70
England 16.17 18.51 -2.30
Wales 13.37 13.93 -0.60
France 10.94 11.70 -0.80
Argentina 9.69 7.38 2.30
Scotland 5.82 4.84 1.00
Fiji -2.19 -4.23 2.00
Samoa -4.83 -2.28 -2.50
Italy -6.04 -5.86 -0.20
Tonga -8.60 -6.31 -2.30
Japan -9.31 -11.18 1.90
USA -16.91 -15.97 -0.90
Georgia -17.74 -17.48 -0.30
Canada -17.89 -18.06 0.20
Romania -19.77 -21.20 1.40
Uruguay -31.41 -31.04 -0.40
Namibia -33.09 -35.62 2.50

 

Performance So Far

So far there have been 32 matches played, 26 of which were correctly predicted, a success rate of 81.2%.
Here are the predictions for previous games.

Game Date Score Prediction Correct
1 England vs. Fiji Sep 18 35 – 11 29.20 TRUE
2 Tonga vs. Georgia Sep 19 10 – 17 11.20 FALSE
3 Ireland vs. Canada Sep 19 50 – 7 35.50 TRUE
4 South Africa vs. Japan Sep 19 32 – 34 33.90 FALSE
5 France vs. Italy Sep 19 32 – 10 17.60 TRUE
6 Samoa vs. USA Sep 20 25 – 16 13.70 TRUE
7 Wales vs. Uruguay Sep 20 54 – 9 51.50 TRUE
8 New Zealand vs. Argentina Sep 20 26 – 16 21.60 TRUE
9 Scotland vs. Japan Sep 23 45 – 10 14.40 TRUE
10 Australia vs. Fiji Sep 23 28 – 13 24.10 TRUE
11 France vs. Romania Sep 23 38 – 11 33.30 TRUE
12 New Zealand vs. Namibia Sep 24 58 – 14 64.00 TRUE
13 Argentina vs. Georgia Sep 25 54 – 9 24.60 TRUE
14 Italy vs. Canada Sep 26 23 – 18 12.50 TRUE
15 South Africa vs. Samoa Sep 26 46 – 6 23.90 TRUE
16 England vs. Wales Sep 26 25 – 28 11.20 FALSE
17 Australia vs. Uruguay Sep 27 65 – 3 50.30 TRUE
18 Scotland vs. USA Sep 27 39 – 16 21.40 TRUE
19 Ireland vs. Romania Sep 27 44 – 10 38.80 TRUE
20 Tonga vs. Namibia Sep 29 35 – 21 27.40 TRUE
21 Wales vs. Fiji Oct 01 23 – 13 23.80 TRUE
22 France vs. Canada Oct 01 41 – 18 29.60 TRUE
23 New Zealand vs. Georgia Oct 02 43 – 10 44.90 TRUE
24 Samoa vs. Japan Oct 03 5 – 26 7.10 FALSE
25 South Africa vs. Scotland Oct 03 34 – 16 16.00 TRUE
26 England vs. Australia Oct 03 13 – 33 3.30 FALSE
27 Argentina vs. Tonga Oct 04 45 – 16 17.00 TRUE
28 Ireland vs. Italy Oct 04 16 – 9 24.70 TRUE
29 Canada vs. Romania Oct 06 15 – 17 2.70 FALSE
30 Fiji vs. Uruguay Oct 06 47 – 15 28.60 TRUE
31 South Africa vs. USA Oct 07 64 – 0 37.90 TRUE
32 Namibia vs. Georgia Oct 07 16 – 17 -17.00 TRUE

 

Predictions for 09 October to 11 October

The prediction is my estimated expected points difference with a positive margin being a win to the first-named team, and a negative margin a win to the second-named team.

Game Date Winner Prediction
1 New Zealand vs. Tonga Oct 09 New Zealand 35.30
2 Samoa vs. Scotland Oct 10 Scotland -10.70
3 Australia vs. Wales Oct 10 Australia 8.20
4 England vs. Uruguay Oct 10 England 54.10
5 Argentina vs. Namibia Oct 11 Argentina 42.80
6 Italy vs. Romania Oct 11 Italy 13.70
7 France vs. Ireland Oct 11 Ireland -5.90
8 USA vs. Japan Oct 11 Japan -7.60

 

October 7, 2015

ITM Cup Predictions for Round 9

Team Ratings for Round 9

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Taranaki 12.77 7.70 5.10
Canterbury 12.56 10.90 1.70
Tasman 8.40 12.86 -4.50
Auckland 8.30 5.14 3.20
Wellington 4.76 -4.62 9.40
Hawke’s Bay 4.14 -0.57 4.70
Counties Manukau 4.05 7.86 -3.80
Otago 1.46 -4.84 6.30
Waikato -6.41 -6.96 0.50
Bay of Plenty -6.46 -9.77 3.30
Manawatu -8.81 -1.52 -7.30
North Harbour -9.59 -10.54 0.90
Southland -10.77 -6.01 -4.80
Northland -18.37 -3.64 -14.70

 

Performance So Far

So far there have been 62 matches played, 44 of which were correctly predicted, a success rate of 71%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Wellington vs. Hawke’s Bay Sep 30 22 – 22 3.70 FALSE
2 North Harbour vs. Otago Oct 01 32 – 39 -7.10 TRUE
3 Waikato vs. Counties Manukau Oct 02 9 – 30 -3.30 TRUE
4 Tasman vs. Canterbury Oct 03 25 – 41 3.30 FALSE
5 Manawatu vs. Taranaki Oct 03 10 – 44 -14.00 TRUE
6 Auckland vs. Northland Oct 03 64 – 21 28.00 TRUE
7 Southland vs. Hawke’s Bay Oct 04 28 – 35 -11.40 TRUE
8 Bay of Plenty vs. Wellington Oct 04 13 – 31 -5.20 TRUE

 

Predictions for Round 9

Here are the predictions for Round 9. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Northland vs. Otago Oct 07 Otago -15.80
2 Taranaki vs. Tasman Oct 08 Taranaki 8.40
3 Hawke’s Bay vs. Waikato Oct 09 Hawke’s Bay 14.60
4 Canterbury vs. Southland Oct 10 Canterbury 27.30
5 Wellington vs. Manawatu Oct 10 Wellington 17.60
6 Counties Manukau vs. Auckland Oct 10 Auckland -0.30
7 North Harbour vs. Northland Oct 11 North Harbour 12.80
8 Otago vs. Bay of Plenty Oct 11 Otago 11.90