July 28, 2014

Rise of the machines

Journalism

Data

The Automatic Statistician project (somewhat flaky website) is working to automate various types of statistical modelling. They have interesting research papers. They also have a demo that’s fairly limited but produces linear regression models, model checks, and descriptions that are reasonable from a predictive point of view.

Automating some bits of data analysis is an important problem, because there aren’t enough statisticians to go around. However (as Cathy O’Neill points out about competition sites like Kaggle), they aren’t tackling the hard bits of data analysis: getting the data ready, and more importantly, getting the question into a precisely-specified form that can be answered by fitting a model.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Ian Wong

    This is great stuff. I had heard about Narrative Science, but not the other two.

    In the future I expect exploratory analysis and model building will be trivial for the majority of problems.

    I agree the hard bits are a little ignored, but undergraduate modelling really feels like something a machine would be better suited at doing.

    But, when data manipulation and feature space creation are replaced by machines…

    10 years ago

  • avatar

    This has been out for a few years in sports reporting, too: http://www.nytimes.com/2011/09/11/business/computer-generated-articles-are-gaining-traction.html?pagewanted=all&_r=0. I sometimes feel I need this for my sampling designs and weighting reports :).

    10 years ago

    • avatar
      Thomas Lumley

      Some years ago Lee Wilkinson was writing a system that did this for basic statistical reporting and analysis. I don’t know what happened to it.

      10 years ago