May 3, 2014

Optimising the for the wrong goal

From Cathy O’Neill, at mathbabe.org

By contrast, let’s think about how most big data models work. They take historical information about successes and failures and automate them – rather than challenging their past definition of success, and making it deliberately fair, they are if anything codifying their discriminatory practices in code.

That is, data mining approaches to making decisions are blind to the prejudices attached to characteristics such as gender, ethnicity, age, so they will readily look at historical data, note that women (gays, immigrants, Maori, older people) haven’t been successful in positions like this in the past, and fail to wonder why.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Renate Thiede

    I suppose that makes sense, though. Data mining is not about ideology or prejudice, after all, but only about existing data. I wouldn’t say that data mining approaches necessarily have to consider the reasons behind the existence of certain data, or that not considering the reasons should constitute failure.
    Rather, data mining approaches should not be used exclusively in making decisions, but should be used in conjunction with socially aware approaches.
    This may be a semantic issue, but I don’t think we should criticize data mining approaches themselves, but rather the fact that they are being used exclusively.

    14288941

    11 years ago

    • avatar
      Thomas Lumley

      I agree, but I think many people who don’t have experience with the power of overfitting in data analysis may not realise how well algorithms can perpetuate whatever they are fed.

      11 years ago

      • avatar
        Renate Thiede

        Thanks for your reply! I hadn’t thought of it that way before.
        I suppose it comes down to the problem of statistical literacy and understanding, which seems to be a pretty prevalent problem at this stage.

        14288941

        11 years ago