November 24, 2016

Briefly

  • “The problem scientists have to face here isn’t whether the data is real, but whether this is an appropriate way to represent it.” On the sea-ice graphic that’s going around.
  • “Using the language of economics, judgment is a complement to prediction and therefore when the cost of prediction falls demand for judgment rises. We’ll want more human judgment.” Harvard Business Review
  • Apps blamed for rise in road deaths (NY Times)
  • The sort of basic search skills Tim O’Reilly describes can also be applied to non-political fake news. If you start with “Ice cream for breakfast makes you smarter, claims scientist” from the Herald you can easily find the Japanese story that’s the source. If you look a little harder, as my brother did, you can find the 2013 story on the same Japanese site, which has a little more detail. Using Google Translate, the research was sponsored by an ice-cream company and the source for the story is the company website. The researcher is real, but the research appears not to have been published — and there has been plenty of time since 2013.   Ice-cream doesn’t really matter, but the question of which stories in the newspaper we’re supposed to take seriously does matter.
November 23, 2016

Indigenous data – why is it important?

andrew-sporle tahu-kukutai-240712In a data-driven world, indigenous peoples are becoming increasingly concerned about who owns and represents statistics about indigenous people: that is, who has access to the data, its cultural integrity, and how people’s privacy and autonomy is protected.

Not only do governments collect data about their citizens, but so, too, do indigenous peoples about themselves – just think of the data that iwi need to collect about their own people in this post-settlement era. As an example, I’m a registered member of Waikato-Tainui. The central administration knows six or so generations of my whakapapa because becoming registered means putting your links on paper that a kaumatua then signs off. It knows my home marae and all sorts of personal details such as where I live and my birth date. As I have been the privileged recipient of educational scholarships from the iwi, it also knows my academic record and quite a lot of personal stuff about my goals and aspirations.

So why is this important? Indigenous people have historically had a problematic relationship with researchers, academics and other data collectors. Researcher Andrew Sporle, pictured at right (Rangitāne, Ngāti Apa, Te Rārawa) recently told me that “From a Māori perspective, we were all too often the researched, not the researchers, and Māori realities were often portrayed as a strange and inferior ‘other’. Indigenous peoples are asserting the right to govern and protect the data that are so important to our development. We cannot afford to lose control of data about us.”

Data, he added, is a “highly valuable strategic asset” for Māori development. “In the age of big data, Māori want access to data to support our decision‐making and to be involved when big data is used to make decisions about us.”

In this field, things have been moving fast of late, and New Zealander statisticians are among the leaders.  Andrew and Tahu Kukutai pictured left (Ngāti Maniapoto, Te Aupōuri), Associate Professor at the Institute of Demographic and Economic Analysis, University of Waikato, are among the founding members of Te Mana Raraunga (the Māori Data Sovereignty Network), which was set up last year to assert Māori rights and interests in relation to data.

The group’s guiding motto is “He whenua hou, Te Ao Raraunga; Te Ao Raraunga, He whenua hou”, or “Data is a new world, a world of opportunity.”  It advocates “for the development of capacity and capability across the Māori data ecosystem, including data rights and interests, data governance, data storage and security, and data access and control”.

Andrew and Tahu attended last month’s  Indigenous Open Data Summit in Madrid, Spain, alongside independent statisticians Kirikowhai Mikaere (Tūhourangi, Ngāti Whakaue) and James Hudson (Ngāti Pukeko, Ngāti Awa, Ngāi Tai, Tūhoe), a researcher for Auckland Council’s Independent Māori Statutory Board. The summit, a first of its kind, provided a forum to discuss what action was being taken to protect the use of data about indigenous peoples.

Tahu and John Taylor, Emeritus Professor at the Centre for Aboriginal Economic Policy Research at the Australian National University,  have edited the just-released first book on indigenous data, titled Indigenous Data Sovereignty – Towards an Agenda, published by ANU Press.

It’s free to download and provides a comprehensive overview of why indigenous oversight of data is important, focusing largely on Australasia. It’s an interesting read and provides a perspective on data that has been missing for too long.

The local contributors include Darin Bishop (Ngāruahine, Taranaki), team leader of organisational knowledge at Te Puni Kōkiri, the Ministry of Māori Development; Dickie Farrar (Whakatōhea, Te Whānau ā Apanui, Te Aitanga ā Mahaki), CEO of the Whakatōhea Māori Trust Board;  James Hudson, mentioned above; Maui Hudson (Ngāruahine, Te Mahurehure, Whakatōhea), Associate Professor in the Faculty of Māori and Indigenous Studies at the University of Waikato; GP Rawiri Jansen (Ngati Hinerangi); Lesley McLean (Whakatōhea, Te Whānau ā Apanui), tribal database coordinator for the Whakatōhea Māori Trust Board; and leading demographer Ian Pool, Emeritus Professor at Waikato University.

 

 

November 21, 2016

Stat of the Week Competition: November 19 – 25 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday November 25 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of November 19 – 25 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

November 20, 2016

Gained in translation

From a talk  at the workshop on Fairness, Accountability, and Transparency in Machine Learning, via Twitter

she-is-a-nurse

There’s obviously something wrong with these translations, but it’s also hard to do better.

To step back, there has classically been a translation problem where Greek and Latin have separate words for man as distinguished from woman and for man ‘as distinguished from beasts and angels’. It can be quite hard to guess which word was in the original source, if you’re working from the English translation.  This problem has a simple solution, since modern English also has a clear (and increasingly unavoidable) distinction between ‘man’ on the one hand and  ‘human’ or ‘person’ on the other.

This isn’t that problem.  It’s kind of the opposite.

The correct translation of “O bir doktor” is one of “He is a doctor”, “She is a doctor”, and “They are a doctor” and the correct translation of “O bir hemşire” is one of “He is a nurse”, “She is a nurse”, and “They are a nurse”.  Without more context, though, you can’t tell which, and none of them is unmarked or neutral.  “He” and “She” are obviously too narrow, and while singular ‘They” has always been standard English for an unspecified individual, it is only recently standard for a specific individual if they have asked to be referred to that way because of non-binary gender identification.

This is an example where the ambiguities probably have to be put back in by humans, because predictive analytics is unavoidably going to follow the stereotypes. Or, as a new Harvard Business Review article rather optimistically says about the impacts of machine learning:

Using the language of economics, judgment is a complement to prediction and therefore when the cost of prediction falls demand for judgment rises. We’ll want more human judgment.

November 18, 2016

Briefly

  • “So what I got from reading some of Clinton’s email is another piece of evidence confirming my intuition that political systems scale poorly.” (medium.com)
  • Cathy O’Neil on a program at Georgia State University: Here’s the thing. One of the hallmark characteristics of a WMD is that it punishes the poor, the unlucky, the sick, or the marginalized. This algorithm does the opposite – it offers them help.
November 15, 2016

Fake news and AI

From Russell Brown at Public Address

As Facebook moved from human curation to trust artificial intelligence to sift it stories, fakery exploded. It was a Google algorithm, not an editor, that made a wholly false claim about the popular vote the “top” story in its rankings. The idea that AI will actually write most of the news we see is genuinely horrifying.

 

Links are good for you

The Herald has found some “surprising health benefits of beer.” It found them in the Telegraph, but otherwise we’re not given a lot of help tracking things down.  There are eleven”surprising benefits”, labelled 1 to 10. Only two are even arguably new. None of them come with a link, or even with the absolute minimum of both a journal and a researcher name.

The unnumbered benefit is the one that’s closest to being new: that a new study of 80,000 people in China found higher HDL (good) cholesterol in people who drank a moderate amount of alcohol — which isn’t all that surprising, since that relationship has been studied for decades.  Here, the research is unpublished: it was presented at a conference this week. The basic conclusion was for moderate consumption of alcohol of any type, not just beer.

Number 1 (lowers the risk of kidney stones) seems to be true and about beer, though the story doesn’t mention that all the participants were smokers.

Number 2 (protects you from heart attacks) was about beer, but it wasn’t about heart attacks. It was about atherosclerosis. In hamsters.

Number 3 (reduces the risk of strokes) is hard to track down — “Harvard Medical School” isn’t very specific. They probably mean this research, which found a slightly lower risk of stroke in people (US male doctors) who drink small amounts of alcohol, not zero, but up to seven drinks a week.  The probably don’t mean this Harvard research showing the risk goes up for the hour after consumption. Again, not specifically beer.

In number 4, the headline claim “strengthens your bones” is borderline true, but the later “significantly reduce your risk of fracturing bones” doesn’t seem to be supported. The research actually found an increase in bone mineral density, which you’d expect to lead to stronger bones but doesn’t always.

Number 5 is partly true: the research wasn’t specific to beer, but men who drink 1-2 standard units of alcohol per day are at lower risk of diabetes, though the evidence that the alcohol is responsible isn’t all that strong.

Number 6 is “reduces the risk of Alzheimer’s”. The story talks about research stretching back to 1977. The Alzheimer’s Society says “It is no longer thought that low to moderate alcohol consumption protects against dementia.” The story mentions silicon and aluminium. The Alzheimer’s Society says Current medical and scientific opinion of the relevant research indicates that the findings do not convincingly demonstrate a causal relationship between aluminium and Alzheimer’s disease.”

Number 7 is about preventing insomnia. It glosses over the alcohol issue entirely.  The research was about the taste of beer, used a dose of less than a tablespoon, and didn’t measure insomnia. It didn’t even measure relaxation, just brain waves.

Number 8 (prevents cataracts). From the press release. “In tests with rat lenses, Trevithick’s laboratory found that antioxidants that act similarly to those in beer protect special parts of cells in the eye – called mitochondria. Damaged mitochondria can lead to an increased incidence of cataracts.” They weren’t looking at beer or even at chemicals in beer, or at cataracts. It’s a step forward to know that chemicals similar to those in beer can reduce damage in rats similar to the damage that causes cataracts, but that still leaves some gaps.

Number 9 (might cure cancer). Chemicals related to some chemicals in beer, at high enough doses, might potentially be turned into cancer treatments.  In order to do the first steps of the research, using the original beer chemicals, scientists need to be able to measure how much they have of them. To calibrate the measurements, they need pure synthetic versions. The research was about progress in working out the synthesis.

Number 10 is particularly special  –“beer helps you lose weight”. Not only is there a specific and detailed explanation of why that’s false in the original press release, it’s even in the Herald story — the doses in the study were the equivalent of 3500 pints of beer per day.

 

November 14, 2016

Stat of the Week Competition: November 12 – 18 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday November 18 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of November 12 – 18 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

November 13, 2016

What polls aren’t good for

From Gallup, how Americans feel about the election

gallup

We can believe the broad messages that many people were surprised; that Trump supporters have positive feelings; that Clinton supporters have negative feelings; that there’s more anger and fear expressed that when Obama first was elected (though not than when he was re-elected). The surprising details are less reliable.

I’ve seen people making a lot of the 3% apparent “buyer’s remorse” among Trump voters, with one tweet I saw saying those votes would have been enough to swing the election. First of all, Clinton already has more votes that Trump, just distributed suboptimally, so even if these were Trump voters who had changed their minds it might not have made any difference to the result.  More importantly, though, Gallup has no way of knowing who the respondents voted for, or even if they voted at all.  The table is just based on what they said over the phone.

It could be that 3% of Trump voters regret it. It could also be that some Clinton voters or some non-voters claimed to have voted for Trump.  As we’ve seen in past examples even of high-quality social surveys, it’s very hard to estimate the size of a very small subpopulation from straightforward survey data.

November 12, 2016

Fizzy headlines

Herald (Daily Mail) headline: How just one can of fizzy drink a day raises the risk of developing type 2 diabetes by 50pc. Here’s the research abstract

  1. The research was about pre-diabetes, or ‘elevated fasting glucose/impaired glucose tolerance’ as it used to be called, not diabetes.  They aren’t remotely the same thing. According to this other research, so-called pre-diabetes has about a 5-10% chance per year of turning into diabetes and about the same chance of just going away.  About half the people in the study  developed pre-diabetes over a seven-year period, even among those who didn’t drink any soft drinks.
  2. The researchers distinguished sugar-sweetened and diet drinks (they saw no suggestion of a risk increase for diet drinks) but did not distinguish fizzy from non-fizzy sugar-sweetened drinks. So the headline divides drinks up in a completely different way from the research. This wasn’t ‘fizzy drink’ research.
  3. The research paper reports multiple estimates of the risk increase. Some models said nearly 50%, but some said about 25%.
  4. There’s a lot of uncertainty even in the purely mathematical sense: the model that says nearly 50% increase came with an uncertainty interval that goes down to 16%, and the one that says 25% has an uncertainty interval going all the way down to zero.

The research itself is perfectly reasonable, providing a bit more evidence on the risks of high-sugar diet (disclaimer: I know a few of the researchers). Even the story isn’t too bad, but the headline is basically completely wrong.