Posts from June 2014 (44)

June 30, 2014

Stat of the Week Competition: June 28 – July 4 2014

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday July 4 2014.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of June 28 – July 4 2014 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: June 28 – July 4 2014

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

Briefly

June 29, 2014

Not yet news

When you read “The university did not reveal how the study was carried out” in a news story about a research article, you’d expect the story to be covering some sort of scandal. Not this time.

The Herald story  is about broccoli and asthma

They say eating up to two cups of lightly steamed broccoli a day can help clear the airways, prevent deterioration in the condition and even reduce or reverse lung damage.

Other vegetables with the same effect include kale, cabbage, brussels sprouts, cauliflower and bok choy.

Using broccoli to treat asthma may also help for people who don’t respond to traditional treatment.

‘How the study was carried out’ isn’t just a matter of detail: if they just gave people broccoli, they wouldn’t know what other vegetables had the same effect, so maybe it wasn’t broccoli but some sort of extract? Was it even experimental or just observational? And did they actually test people who don’t respond to traditional treatment? And what exactly does that mean — failing to respond is pretty rare, though failing to get good control of asthma attacks isn’t.

The Daily Mail story was actually more informative (and that’s not a sentence I like to find myself writing). They reported a claim that wasn’t in the press release

The finding due to sulforaphane naturally occurring in broccoli and other cruciferous vegetables, which may help protect against respiratory inflammation that can cause asthma.

Even then, it isn’t clear whether the research really found that sulforaphane was responsible, or whether that’s just their theory about why broccoli is effective. 

My guess is that the point of the press release is the last sentence

Ms Mazarakis will be presenting the research findings at the 2014 Undergraduate Research Conference about Food Safety in Shanghai, China.

That’s a reasonable basis for a press release, and potentially for a story if you’re in Melbourne. The rest isn’t. It’s not science until they tell you what they did.

Ask first

Via The Atlantic, there’s a new paper in PNAS (open access) that I’m sure is going to be a widely cited example by people teaching research ethics, and not in a good way:

 In an experiment with people who use Facebook, we test whether emotional contagion occurs outside of in-person interaction between individuals by reducing the amount of emotional content in the News Feed. When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred. These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks.

More than 650,000 people had their Facebook feeds meddled with in this way, and as that paragraph from the abstract makes clear, it made a difference.

The problem is consent.  There is a clear ethical principle that experiments on humans require consent, except in a few specific situations, and that the consent has to be specific and informed. It’s not that uncommon in psychological experiments for some details of the experiment to be kept hidden to avoid bias, but participants still should be given a clear idea of possible risks and benefits and a general idea of what’s going on. Even in medical research, where clinical trials are comparing two real treatments for which the best choice isn’t known, there are very few exceptions to consent (I’ve written about some of them elsewhere).

The need for consent is especially clear in cases where the research is expected to cause harm. In this example, the Facebook researchers expected in advance that their intervention would have real effects on people’s emotions; that it would do actual harm, even if the harm was (hopefully) minor and transient.

Facebook had its research reviewed by an Institutional Review Board (the US equivalent of our Ethics Committees), and the terms of service say they can use your data for research purposes, so they are probably within the law.  The psychologist who edited the study for PNAS said

“I was concerned,” Fiske told The Atlantic, “until I queried the authors and they said their local institutional review board had approved it—and apparently on the grounds that Facebook apparently manipulates people’s News Feeds all the time.”

Fiske added that she didn’t want the “the originality of the research” to be lost, but called the experiment “an open ethical question.”

To me, the only open ethical question is whether people believed their agreement to the Facebook Terms of Service allowed this sort of thing. This could be settled empirically, by a suitably-designed survey. I’m betting the answer is “No.” Or, quite likely, “Hell, no!”.

[Update: Story in the Herald]

June 26, 2014

Want to learn data analysis? No stats experience required

4 Chris Wild, UoAInterested in learning to do data analysis but don’t know where to start? Try out the Department of Statistics’ new MOOC (massive online open course) called From Data to Insight: An Introduction to Data Analysis. It’s free – yep, it won’t cost you a bean – starts on October 6, takes just three hours a week, and will be led by our resident world-renowned statistics educator Prof Chris Wild (right).

The blurb says, in part:

“The course focuses on data exploration and discovery, showing you what to look for in statistical data, however large it may be. We’ll also teach you some of the limitations of data and what you can do to avoid being misled. We use data visualisations designed to teach you these skills quickly, and introduce you to the basic concepts you need to start understanding our world through data.

“This course assumes very little experience with statistical ideas and concepts. You will need to be comfortable thinking in terms of percentages, have basic Microsoft Excel skills, and a Windows or Macintosh computer to download and install our iNZight software.”

And that’s all you need. Spread the word.

 

 

 

 

 

 

Slightly too Open Data

  1. The Atlantic published some visualisations of taxi rides in New York
  2. Chris Whong asked for the data under Freedom-of-Information laws, and got it. Of course, the taxi and driver ids were anonymized
  3. Vijay Pandurangan noticed that the driver id and taxi id were really, really weakly anonymised.
  4. You can find out a lot once you know the taxi id.

 

The NY Taxi & Limousine Commission had run the ids through a cryptographic hash function, MD5. Hash functions are designed so that if you don’t know anything about the input you can’t reconstruct it from the output, but if you know the input exactly, you can verify easily that it gives the same output.  The problem comes when you know a lot about the input, but not everything.  In this case, there are only about two million possible id numbers, and you can just try them all. Once you have the ids, you can look up.

Even if the taxi authorities had done the anonymisation correctly — replacing each id with a random number — it would inevitably have been possible to extract some of the ids with a bit of work.  That’s not the same as being able to extract all of them with a few hours’ computer time.

Roundup return

The Séralini et al paper on Roundup and Roundup-resistant GM corn is back. The NZ Science Media Centre has comments.  As does Retraction Watch

June 25, 2014

NRL Predictions for Round 16

Team Ratings for Round 16

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 9.74 12.35 -2.60
Rabbitohs 7.89 5.82 2.10
Sea Eagles 6.70 9.10 -2.40
Broncos 4.31 -4.69 9.00
Cowboys 3.41 6.01 -2.60
Warriors 2.59 -0.72 3.30
Panthers 2.25 -2.48 4.70
Storm 1.81 7.64 -5.80
Bulldogs 1.71 2.46 -0.80
Knights -1.95 5.23 -7.20
Wests Tigers -4.77 -11.26 6.50
Titans -5.41 1.45 -6.90
Eels -5.78 -18.45 12.70
Dragons -6.09 -7.57 1.50
Raiders -7.68 -8.99 1.30
Sharks -10.52 2.32 -12.80

 

Performance So Far

So far there have been 110 matches played, 63 of which were correctly predicted, a success rate of 57.3%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Raiders vs. Bulldogs Jun 20 14 – 22 -4.10 TRUE
2 Warriors vs. Broncos Jun 21 19 – 10 1.30 TRUE
3 Sharks vs. Sea Eagles Jun 21 0 – 26 -9.80 TRUE
4 Storm vs. Eels Jun 22 46 – 20 9.00 TRUE
5 Titans vs. Dragons Jun 22 18 – 19 6.70 FALSE
6 Knights vs. Cowboys Jun 23 36 – 28 -2.90 FALSE

 

Predictions for Round 16

Here are the predictions for Round 16. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Roosters Jun 27 Sea Eagles 1.50
2 Broncos vs. Sharks Jun 27 Broncos 19.30
3 Wests Tigers vs. Raiders Jun 28 Wests Tigers 7.40
4 Cowboys vs. Rabbitohs Jun 28 Cowboys 0.00
5 Warriors vs. Panthers Jun 29 Warriors 4.80
6 Eels vs. Knights Jun 29 Eels 0.70
7 Dragons vs. Storm Jun 30 Storm -3.40

 

Super 15 Predictions for Round 17

Team Ratings for Round 17

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.68 8.80 -0.10
Waratahs 5.97 1.67 4.30
Sharks 5.65 4.57 1.10
Brumbies 3.64 4.12 -0.50
Hurricanes 2.74 -1.44 4.20
Bulls 2.62 4.87 -2.30
Stormers 1.98 4.38 -2.40
Chiefs 1.44 4.38 -2.90
Blues 0.21 -1.92 2.10
Highlanders -1.69 -4.48 2.80
Force -2.85 -5.37 2.50
Reds -4.13 0.58 -4.70
Cheetahs -4.52 0.12 -4.60
Lions -6.02 -6.93 0.90
Rebels -6.73 -6.36 -0.40

 

Performance So Far

So far there have been 101 matches played, 65 of which were correctly predicted, a success rate of 64.4%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Crusaders vs. Force May 30 30 – 7 14.40 TRUE
2 Reds vs. Highlanders May 30 38 – 31 0.70 TRUE
3 Chiefs vs. Waratahs May 31 17 – 33 1.60 FALSE
4 Blues vs. Hurricanes May 31 37 – 24 -1.80 FALSE
5 Brumbies vs. Rebels May 31 37 – 10 10.90 TRUE
6 Lions vs. Bulls May 31 32 – 21 -8.50 FALSE
7 Sharks vs. Stormers May 31 19 – 21 7.40 FALSE

 

Predictions for Round 17

Here are the predictions for Round 17. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Chiefs Jun 27 Chiefs -0.60
2 Rebels vs. Reds Jun 27 Reds -0.10
3 Hurricanes vs. Crusaders Jun 28 Crusaders -3.40
4 Waratahs vs. Brumbies Jun 28 Waratahs 4.80
5 Force vs. Blues Jun 28 Force 0.90