May 17, 2016

Briefly

  • You’ve probably seen this, but Facebook’s news feed editing wasn’t as algorithmic as they were suggesting. Of course, that tells you nothing one way or the other about bias, as people including Cathy O’Neil point out.
  • The difficulties of turning data science into gobs and gobs of money, as illustrated by Palantir. From Roger Peng at Simply Statistics.
  • Finally for stats/literature dual nerds, an excerpt from the new book by historian of statistics Stephen Stigler

May 16, 2016

Stat of the Week Competition: May 14 – 20 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday May 20 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of May 14 – 20 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

May 13, 2016

Aggregation, not ok?

You’ve probably heard of OkCupid, a dating site. People give sites like that a lot of personal information. And, in a sense, the information is obviously not going to be kept secret — after all, the point of using a dating site is to be found by people you don’t already know.  When someone writes a script to collect the data from large numbers of users, and then publishes it in a convenient and easy to process format, you can just about see how they’d think that was ok. It’s harder to see how they’d be surprised not everyone feels that way.

Aggregation makes a difference because we can search, match, and analyse the data by computer. That’s important for two reasons.

First, it’s quicker and easier — you can get a set of records grouped by sexual preference or other interests almost as quickly as you can think of the question, and you can link usernames or other information to other datasets. The database includes potential matching variables like income, education level, age, job, country, city, which you could still use just taking down data one person at a time by hand, but it would be slow and boring.

Second, the database is impersonal. If you stood outside a gay bar watching who went in and out, you couldn’t really pretend you were innocently using publicly visible information.  If you signed up and went through dating profiles one at a time, it would be easier to pretend, but you’d still tend to see the people behind the data. When it’s a big spreadsheet, it’s easier to ignore how the people would feel about it.

Sometimes people aggregate and publish data knowing it may do harm, because they think there’s a higher interest involved in getting the data out — even if the data release is obviously illegal. This release isn’t obviously illegal (though there are possibilities), but the higher interest is pretty obscure too. The accompanying research paper says

As an example of the analyses one can do with the dataset, a cognitive ability test is constructed from 14 suitable items. To validate the dataset and the test, the relationship of cognitive ability to religious beliefs and political interest/participation is examined.

Those variables are so not what’s going to attract people to these data. But even if you think it’s important for anyone on the internet to be able to do that sort of correlation for variables such as sexual orientation and drug use, it’s hard to think of a reason to include the OkCupid username.

May 12, 2016

Stretching it a bit

Q; Did you see yoghurt prevents cancer?

A: Where?

Q: The Herald (from the Daily Telegraph): “8 ways to lower your cancer risk.” Number one is “Eat yoghurt”. And they even have a link to research. How’s that for impressive?

A: Not exactly a link. They mention the name of a journal, but don’t even give the researchers’ names.

Q: Can’t you find them?

A: Of course. It’s even open-access.

Q: So, how much yoghurt did the people have to eat?

A: No yoghurt was harmed in this experiment. Also no people.

Q: Mice?

A: Mice.

Q: But yoghurt?

A: No. Some of the mice were set up with a restricted set of gut bacteria (missing known nasty ones) by being raised in a mouse colony who all had the restricted set.

Q: But the story says “gave one group of mice beneficial bacteria through probiotic supplements and the other non-beneficial bacteria.

A: Yes, it does. The research paper, not so much. Nor even the press release.

Q: So why yoghurt?

A: One of the bacteria that was more common in the mice with the restricted set is a Lactobacillus strain. Other Lactobacillus strains, even sometimes from the same species, are involved in making yoghurt, sourdough, sauerkraut, kimchi, etc.

Q: And you could use the mouse bacteria to make these foods?

A: In principle, probably, though you might not want to advertise it that way.

Q: So, the mice with more Lactobacillus were less likely to get cancer?

A: These were mutant mice who all get cancer, so that’s not really the question. They took longer to get cancer.

Q: So we can’t really be confident yoghurt would prevent normal mice from getting cancer?

A: No, it’s too soon to tell.

Q: Good thing normal mice don’t read the newspapers, then.

May 11, 2016

Super 18 Predictions for Round 12

 

Team Ratings for Round 12

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 10.09 9.84 0.30
Hurricanes 7.09 7.26 -0.20
Highlanders 6.86 6.80 0.10
Chiefs 5.22 2.68 2.50
Waratahs 3.78 4.88 -1.10
Brumbies 2.75 3.15 -0.40
Stormers 1.62 -0.62 2.20
Sharks 1.29 -1.64 2.90
Lions 0.00 -1.80 1.80
Bulls -0.10 -0.74 0.60
Blues -3.68 -5.51 1.80
Rebels -5.29 -6.33 1.00
Cheetahs -7.08 -9.27 2.20
Jaguares -7.24 -10.00 2.80
Reds -9.61 -9.81 0.20
Force -10.87 -8.43 -2.40
Sunwolves -17.32 -10.00 -7.30
Kings -20.75 -13.66 -7.10

 

Performance So Far

So far there have been 85 matches played, 58 of which were correctly predicted, a success rate of 68.2%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Crusaders vs. Reds May 06 38 – 5 22.40 TRUE
2 Brumbies vs. Bulls May 06 23 – 6 5.50 TRUE
3 Sunwolves vs. Force May 07 22 – 40 -0.30 TRUE
4 Chiefs vs. Highlanders May 07 13 – 26 3.90 FALSE
5 Waratahs vs. Cheetahs May 07 21 – 6 14.80 TRUE
6 Sharks vs. Hurricanes May 07 32 – 15 -4.40 FALSE
7 Kings vs. Blues May 07 18 – 34 -12.70 TRUE

 

Predictions for Round 12

Here are the predictions for Round 12. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Crusaders May 13 Highlanders 0.30
2 Rebels vs. Brumbies May 13 Brumbies -4.50
3 Hurricanes vs. Reds May 14 Hurricanes 20.70
4 Waratahs vs. Bulls May 14 Waratahs 7.90
5 Sunwolves vs. Stormers May 14 Stormers -14.90
6 Cheetahs vs. Kings May 14 Cheetahs 17.20
7 Lions vs. Blues May 14 Lions 7.70
8 Jaguares vs. Sharks May 14 Sharks -4.50

 

May 10, 2016

Foreign real-estate investment

The first data under the new real-estate ownership reporting scheme is out. The Herald has a story and also includes a full copy of the report.

So, what proportion of Auckland property sales were reported as being to China?

In Auckland, the level of foreign investment was slightly higher than the national level, at 4 per cent, or 474 properties. Nearly 60 per cent of these properties went to Chinese tax residents.

That’s 60% of 4%, or a bit under 2.5%.  Auckland is different; in the rest of New Zealand the majority of foreign (tax-residency) investors are Australians.

The LINZ report does a good job explaining the real limitations of `tax residence’ as a criterion, but it’s a lot better than any previous data we’ve had.

There were also questions about actual residency and intention to occupy a home, but these were harder to interpret because of property bought by companies or trusts, where the questions didn’t have a good answer.

I’d suggest starting with the report rather than the news coverage.

 

May 9, 2016

What’s wrong with science news

“Coffee today is like God in the Old Testament”, says John Oliver, reviewing the positive and negative headlines over the past year or so.  It’s excellent, if a little overblown in places.

On a related note, another site has fallen for the ‘cheese addiction/casomorphin’ hoax that we’ve seen before a few times. This time it’s Pharmacy Times.

Stat of the Week Competition: May 7 – 13 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday May 13 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of May 7 – 13 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

May 7, 2016

Open data: baby names

The Herald has a headline “Emma and Noah continue to be tops for baby names”, with this link from the web front page

baby

In fact, Noah was number 11 as a baby boy’s name, and Emma didn’t make the top hundred names for baby girls in New Zealand.  The top names in NZ, as in this Stuff story from the first week of January, were Oliver and Olivia. That story also had tables and graphs from the Dept of Internal Affairs data.

The new Herald story is about the USA, where they take longer to accumulate and release the baby-name data, but where they have the indefatigable Laura Wattenberg to make sure it gets publicised.

In fact, it’s kind of surprising how much difference there is between the US and NZ lists. Enough to make it worth pointing out in the story.  UK data won’t be out for another few months. Based on last year, it’s a bit more similar to NZ. Maybe we’ll get another story then.

 

May 6, 2016

Reach out and touch someone

Q: Did you see in the Herald that texting doesn’t help relationships?

A: That’s what they said, yes.

Q: And is it what they found?

A: Hard to tell. There aren’t any real descriptions of the results

Q: What did they do?

A: Well, a couple of years ago, the researcher had a theory that “sending just one affectionate text message a day to your partner could significantly improve your relationship.”

Q: So the research changed her mind?

A: Sounds like.

Q: That’s pretty impressive, isn’t it?

A: Yes, though it doesn’t necessary mean it should change our mind.

Q: It sounds like a good study, though. Enrol some people and regularly remind half of them to send affectionate text messages.

A: Not what they did

Q: They enrolled mice?

A: I don’t think there are good animal models for assessing affectionate text messages. Selfies, maybe.

Q: Ok, so that publicity item about the research is headlined “Could a text a day keep divorce away?”

A: Yes.

Q: Did they people about their text-messaging behaviour and then wait to see who got divorced?

A: It doesn’t look like it.

Q: What did they do?

A: It’s not really clear: there are no details in the Herald story or in the Daily Mail story they took it from.  But they were recruiting people for an online survey back in 2014.

Q: A bogus poll?

A: Well, if you want to put it that way, yes. It’s not as bogus when you’re trying to find out if two things are related rather than how common one thing is.

Q: <dubiously> Ok . And then what?

A: It sounds like they interviewed some of the people, and maybe asked them about the quality of their relationships. And that people who didn’t see their partners or who didn’t get affection in person weren’t as happy even if they got a lot of texts.

Q: Isn’t that what you’d expect anyway? I mean, even if the texts made a huge difference, you’d still wish that you had more time together or that s/he didn’t stop being affectionate when they got off the phone.

A: Pretty much. The research might have considered that, but we can’t tell from the news story. There doesn’t even seem to be an updated press release, let alone any sort of publication.

Q: So people shouldn’t read this story and suddenly stop any social media contact with their sweetheart?

A: No. That was last week’s story.