October 24, 2016

Why so negative?

My StatsChat posts, and especially the ‘Briefly’ links, tend to be pretty negative about big data and algorithmic decision-making. I’m a statistician, and I work with large-scale personal genomic data, so you’d expect me to be more positive. This post is about why.

The phrase “devil’s advocate” has come to mean a guy on the internet arguing insincerely, or pretending to argue insincerely, just for the sake of being a dick. That’s not what it once meant. In the early eighteenth century, Pope Clement XI created the position of “Promoter of the Faith” to provide a skeptical examination of cases for sainthood. By the time a case for sainthood got to the Vatican, there would be a lot of support behind it, and one wouldn’t have to be too cynical to suspect there had been a bit of polishing of the evidence. The idea was to have someone whose actual job it was to ask the awkward questions — “devil’s advocate” was the nickname.  Most non-Catholics and many Catholics would argue that the position obviously didn’t achieve what it aimed to do, but the idea was important.

In the research world, statisticians are often regarded this way. We’re seen as killjoys: people who look at your study and find ways to undermine your conclusions. And we do. In principle you could imagine statisticians looking at a study and explaining why the results were much stronger than the investigators thought, but since people are really good at finding favourable interpretations without help, that doesn’t happen so much.

Machine learning includes some spectacular achievements, and has huge potential for improving our lives. It also has a lot of built-in support both because it scales well to making a few people very rich, and because it fits in with the human desire to know things about the world and about other people.

It’s important to consider the risks and harms of algorithmic decision making as well as the very real benefits. And it’s important that this isn’t left to people who can be dismissed as not understanding the technical issues.  That’s why Cathy O’Neil’s book Weapons of Math Destruction is important, and on a much smaller scale it’s why you’ll keep seeing stories about privacy or algorithmic prejudice here on StatsChat. As Section 162 (4) (a) (v) of the Education Act indicates, it’s my actual job.

 

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Joseph Delaney

    I like your point about machine learning and big data. I’ll also point out that good ideas can be extended too far as well. Antibiotics are great ways to heal wound infections and make surgery much less risky. But if you give them out for common colds, they may do more harm than good. If people could actually get rewarded for giving these drugs for common colds then it could lead to problems (even with little to no incentive, we see some problems).

    So I think you (and Cathy) also do a service when you demarck the boundary between “this is a good idea” and “the technique wasn’t designed to be pushed this far”.

    8 years ago