What should data use agreements look like?
After the news about Jarrod Gilbert being refused access to crime data, it’s worth looking at what data-use agreements should look like. I’m going to just consider agreements to use data for one’s own research — consulting projects and commissioned reports are different.
On Stuff, the police said
“Police reserves the right to discuss research findings with the academic if it misunderstands or misrepresents police data and information,” Evans said.
Police could prevent further access to police resources if a researcher breached the agreement, he said.
“Our priority is always to ensure that an appropriate balance is drawn between the privacy of individuals and academic freedom.
That would actually be reasonable if it only went that far: an organisation has confidential data, you get to see the data, they get to check whether you’ve reported anything that would breach their privacy restrictions. They can say “paragraph 2, on page 7, the street name together with the other information is identifying”, and you can agree or disagree, and potentially get an independent opinion from a mediator, ombudsman, arbitrator, or if it comes to that, a court.
The key here is that a breach of the agreement is objectively decidable and isn’t based on whether they like the conclusions. The problem comes with discretionary use of data. If the police have discretion about what analyses can be published, there’s no way to tell whether and to what extent they are misusing it. Even if they have only discretion about who can use the data, it’s hard to tell if they are using the implied threat of exclusion to persuade people to change results.
Medical statistics has a lot of experience with this sort of problem. That’s why the International Committee of Medical Journal Editors says, in their ‘conflict of interest’ recommendations
Authors should avoid entering in to agreements with study sponsors, both for-profit and non-profit, that interfere with authors’ access to all of the study’s data or that interfere with their ability to analyze and interpret the data and to prepare and publish manuscripts independently when and where they choose.
Under the ICMJE rules, I believe the sort of data-use restrictions we heard about for crime data would have to be disclosed as a conflict of interest. The conflict wouldn’t necessarily lead to a paper being rejected, but it would be something for editors and reviewers to bear in mind as they looked at which results were presented and how they were interpreted.
Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »