January 7, 2017

Social data analytics: how not to do it

Over the holidays, problems began emerging with the new data-based approach to detecting benefit overpayments in Australia. I learned about this from @Asher_Wolf, an Australian privacy advocate.  In a significant number of cases the computer system was  inaccurate as to whether people owed money.  Documentation to correct the errors is the sort of thing a lot of people don’t have lying around (though perhaps technically they should) and in at least some cases the computer system didn’t allow the correct information to be submitted.   The Sydney Morning Herald has a piece (warning: autoplaying audio ads) referencing Cathy O’Neil’s book Weapons of Math Destruction.

Australian regulations on government data-matching systems call for the development of a ‘program protocol’, including “description of the data to be provided and the methods used to ensure it is of sufficient quality for use in the program” and “a statement of the costs and benefits of the program.” However, in Appendix C describing the cost-benefit statement it’s made clear than only cash costs and benefits to the Commonwealth count. Monetary compliance costs to individuals don’t count, and non-monetary costs don’t count. Sending out more letters seems to counts as beneficial as long as it raises more money than you spend doing it — whether or not that money is legally owed.

The ‘technical standards’ report is supposed to cover data integrity and risks “including, but not limited to, risks to the privacy of individuals, reputational risks, and risks relating to incorrect matches.”  In particular, it’s supposed to describe “the sampling techniques used to verify the validity/accuracy of matches”.  That would be interesting to see, given that it seems to take a lot of work to prove that a match is incorrect.

In principle this might all  be worked out in the appeals process, by real humans — or, at least, the amounts of repayments might be. The stress inflicted on the recipients of the letters and the harm done to the reputation of Australia’s government data systems are harder to fix.  In the short term, the former is (rightly) getting more attention; in the long term it might be the latter that does the greater damage.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Megan Pledger

    After the online census debacle, the Ozzies are starting to look pretty amateur.

    8 years ago

    • avatar
      Thomas Lumley

      It wasn’t a good year for them, certainly.

      8 years ago

  • avatar
    duncan hedderley

    Not a stat-y point exactly, but the Aussies do seems further down the track of interacting with government online or not at all than NZ

    8 years ago