Posts from December 2017 (17)

December 15, 2017

Public comments, petitions, and other self-selected samples

In the US, the Federal Communications Commission was collecting public comments about ‘net neutrality’ — an issue that’s commercially and politically sensitive in a country where many people don’t have any real choice about their internet provider.

There were lots of comments: from experts, from concerned citizens, from people who’d watched a John Oliver show. And from bots faking real names and addresses on to automated comments.   The Wall Street Journal contacted a random sample of nearly 3000 commenters and found the majority of those they could get in contact with had not submitted the comment attached to their details.  The StartupPolicyLab attempted to contact 450,000 submitters, and got a response from just over 8000. Of the 7000 contacted about pro-neutrality comments, nearly all agreed they had made the comment, but of the 1000 responses about anti-neutrality comments, about 88% said they had not made the comment.

It’s obviously a bad idea to treat the comments as a vote. Even if the comments were from real US people, with one comment each, you’d need to do some sort of modelling of the vast majority who didn’t comment.  But what are they good for?

One real benefit is for people to provide ideas you hadn’t thought of.  The public comment process on proposed New Zealand legislation certainly allows for people like Graeme Edgeler to point out bugs in the drafting, and for people whose viewpoints were not considered to speak out.  For this, it doesn’t matter what the numbers of comments are, for and against. In fact, it helps if people who don’t have something to say don’t say it.

With both petitions and public comments there’s also some quantitative value in showing that concern about some issue you weren’t worrying about isn’t negligibly small; that thousands (in NZ) or hundreds of thousands (in the US) care about it.

But if it’s already established that an issue is important and controversial, and you care about the actual balance of public opinion, you should be doing a proper opinion poll.

Briefly

  • From the LA Times, week before last The Los Angeles Police Department asked drivers to avoid navigation apps, which are steering users onto more open routes — in this case, streets in the neighborhoods that are on fire.  A related XKCD
  • From a Twitter thread about machine learning in the law (via Alex Hayes): For example, in crime prediction, X is not a sample from the population but instead from the subset of the population investigated by the police. Y is not the true outcome (e.g., did the person commit a crime), but the conclusion of the legal system. So a model predicting Y from X is predicting how the legal system will treat a person X that they have chosen to investigate. It is not predicting whether a person X’ drawn from the general population is guilty (Y’).  He’s not saying this is impossible to fix, but it won’t fix itself.
  • Wil Undy, who did the Herald’s poll aggregator in the NZ election, now has a blog.
  • A fascinating newly-discovered optical illusion
  • Interactive map of debt in the USA (via Alberto Cairo)
  • Thinking about ethics of machine-learning experiments such as determining sexual orientation from face images.
  • The Washington Post has exit polls for the Alabama senate election, including graphs.  I think it would be helpful to  show the size of different groups, not just how they voted

Di Cook: “I had advantages early on, and I feel like I need to pay that back”

Australian Di Cook @visnut was one of several leading women in data science who attended this week’s joint conference of the New Zealand Statistical Association, the International Association oDi Cookf Statistical Computing (Asian Regional Section) and the Operations Research Society of New Zealand at the University of Auckland, so we couldn’t miss the opportunity to talk with her. A brief bio: Di is a world leader in data visu­al­isa­tion and well-known for her work on inter­ac­tive graph­ics. She is Professor of Business Analytics in the Department of Econometrics and Business Statistics at Monash University. She’s a Fellow of the American Statistical Association, elected member of the R Foundation and the Editor of the Journal of Computational and Graphical Statistics. Her research lies in data science, data visualisation, exploratory data analysis, data mining, high-dimensional methods and statistical computing.

Statschat: When did you first encounter statistics? Di: It was in my undergraduate degree. I studied mathematics with a plan to do math teaching. Statistics was one of the areas of mathematics that I could major in other than pure, or applied, mathematics. There was an extremely good female professor at the University of New England, Eve Bofinger, and I was drawn to some of the methods she was teaching, and that led me into statistics.

What was your career path after that?  I taught math at high school for about three months, then I had an offer from the Australian National University to go there as a research assistant, and that seemed a better fit. As a research assistant, I got to learn a lot more things, particularly computing. Computing, I think, is a critical aspect of data science today.

I spent a few years doing that and then realised I’d really like to make art, because some of the research-assistant work I was doing was computer graphics for data online. It fed into my art instincts from teenage years, so I spent some time as an artist before finding a graduate programme in statistics in the US that focused on data visualisation.

What sort of art do you do? I was painting – I haven’t done any for a long, long time, since I finished my PhD; it’s been too busy.

So your creative pursuits have fed into your career. Yeah – seeing that I could do data visualisation as a part of the statistics allowed me to realise that I could do a higher degree in stats; that merged my interests very well.

Where did you do your PhD? At Rutgers University in New Jersey.

You spent 22 years at Iowa State University in the US, and moved to Monash in Australia in 2015. What are your major projects there? I have a lot of projects. One of them is with Tennis Australia; we’ve been looking at tennis serves. So we have Hawk-Eye trajectory data and we visualise the tennis serves and look at how the players are different or similar.

That’s very cool – how’s that for applied statistics. Yeah, it’s fantastic, isn’t it. We’re also looking at face recognition in tennis video, to be able to detect the face through broadcast video, so that we can monitor emotions throughout a match and see how that affects performance.

We’re also looking at pedestrian sensor data, that comes from a city of Melbourne (almost live) feed. One of my PhD students, Earo, has a new type of plot called a calendar plot; you make your data plots into a calendar format so that you can study things relative to holidays, and put it really on a human pattern basis.

Describe a typical day at work at Monash. We have a lot of meetings with students, so I would meet up with two or three students – PhD students or postdocs or research assistants – on projects that we’re working with, and meet up with other faculty. On some days I’m teaching data science classes to around 200 students. We often just go for a coffee with colleagues. We also play ping-pong on the conference table! I’ve got a good group of colleagues who play tennis, so we play tennis together.

It sounds very collegial. You’re a prominent woman in data science, and the field seems to appeal to women as a career path. Do have any thoughts on that? I haven’t really looked at those numbers … but honestly, I think there’s too big of an emphasis on gender differences, and they’re not real when you look at the metrics. It’s just a perception. But one of the things I notice with the women that I work with is that they are interested in solving problems, and having an outcome of their work that makes life better for others. And that’s one thing that data science offers that pure statistics research is a bit removed from.

Do you have a family? I have one son. I moved to Monash after he graduated high school. He went off to college in the US, while I moved halfway across the globe, which he was quite happy about. He visits during the holidays, and last American summer found an internship at Monash University.

When he was small, how did you navigate work and life? It’s really difficult. I can’t imagine how single women do that – you need to have some sort of support mechanism. Day-care is amazing – and however much you spend on day-care, it’s worth it. And also partly because I think young kids early on really get a huge amount of benefit from being in the social mix of other kids the same age. He was in day-care from three months, part-time, and even at five months, if we were away for a week, when he’d get back, the other babies were over the moon – they recognised each other. I hadn’t realised how early on that socialisation happens.

So you weren’t concerned about day-care at all. Some women get tied up in knots about putting their kids in day-care. I know – there’s this thing about guilt. It is actually the best environment – they [pre-school educators] can do a much better job than me. If my time pressure is relieved by not having to have every moment dealing with all the stuff you have to deal with young kids … he’s come out as being a very sociable child and that he learnt from early on. Guaranteed when you’ve got the most important meeting, and your husband has a most important meeting at exactly the same time, that’ll be the time your kid gets sick. So you have to have a backup.

So what advice do you give other academic mums? Don’t stress – there are ways around. And the meeting you think is most important doesn’t have to be the most important. You just juggle everything you have as well as you can, and there are ways around any hurdle or hiccup. Just keep out there. It’s really important for other younger women to see women in senior roles.

Are universities doing the necessary to help women make the most of their talents in data science? I think it’s still a struggle. I think there’s been bureaucratic pushes for gender equality, which is really how I actually got an academic position in the first place in the US.

How so? Equal opportunity. Many statistics departments had no women, and it was a cultural shift in the early 1990s that many university administrations were forcing departments to hire women … or otherwise they couldn’t hire … if they [universities] were doing it well, they were not putting women in that situation of thinking, “Oh I was only hired because I was a woman”. They were doing it in the sense of making sure that women realised that they were talented, and wanted for their  talents, not just because of the administration push. But that wasn’t universal.

I thought things have been solved, but it’s not. Time and time again women are evaluated differently at promotion, and in classroom evaluations, they are not on average [rated to be] as good as the men, and that’s been shown again and again and again. So the thing is, don’t get put off by that; you will sometimes need to fight for your promotions and have people willing to fight for you.

Systemically, things are still not weighted fairly between men and women. It’s not. I’ve just finished studying some of the research-grant rates in Australia and the number given to women faculty are pitiful, from both the Australian Research Council and the National Health and Medical Research Council, which is the health sciences. That impacts whether women can get through to those higher ranks. That’s my next fight.

Would you see yourself as a crusader? How do you define yourself in exposing these inequalities? We’ve seen a lot of things [around sex, privilege and power discussed] in public in these last few months, with the sex scandals in Hollywood.  I’ve seen that all through my career in academia. I think we, hopefully, are on a cusp where the playing field for recognising talent among women becomes more level … I had advantages early on, and I feel like I need to pay that back.

I wouldn’t say I’m a crusader; I’m saying I see where we’ve come from, in terms of generations of women in my family, and where we are now, and we’ve come a long, long way. I’ve had so many more opportunities than my mum and my grandmother … I feel like I’ve got a responsibility to those generations to keep it moving in the right direction.

What advice would you give young women looking at a career in data science? What skills and attributes do they need to develop? Get onto the publicly available software – free software like R and Python – and get to know them. These are hugely powerful, and they give you power. There’s a number of courses you can do for free to help learn how to work with data.

Any particular courses that you would recommend? There’s Data Camp and Corsera and Software Carpentry, among others. Work with data. Play. Extract somebody’s tweets and analyse the text – there are really good resources for that. Pull data from the government web pages – they have lots of information. The New Zealand Herald has lots of data available. Just get comfortable finding data, making plots of it, and seeing whether it matches up what the media is reporting about a problem. This is the sort of power you can get over your life if you can make decisions yourself, rather than being fed decisions.

Read more about Di Cook:

Her academic page

Wikipedia

Another Q & A

December 8, 2017

Attributing risk

Some time in the next week or so, we should be getting the ACC Christmas Sermon, where we get told about how many accidents happen on Christmas Day. From last year’s version in the Herald

Every year, more than 3400 claims are lodged with ACC for Christmas Day incidents, costing the country almost $3million.

As I always point out, that’s a lot less than the number lodged on a typical day that isn’t Christmas.  On the other hand, many of those 3400 are genuinely Christmas-caused injuries; accidents that would not have happened on some random day in summer.

You can look at Christmas-attributable risk by considering individual cases and counting the number that involve new toys, Christmas trees, batteries inserted in appropriately, misuse of wrapping paper, etc, etc. Or, you can compare Christmas to an otherwise similar day.

Rafa at Simply Statistics writes about a more serious example.

The official death toll from Hurricane María in Puerto Rico is 55. That’s 55 people whose death can be specifically and clearly attributed to the hurricane. However, the number of recorded deaths from all causes in September was 2838, which is 455 above the average for September in recent years. The next largest exceedance in the past seven years was just over 200 in November 2014.

Attributing deaths on a case-by-case basis to a disaster like María is hard; it would be hard to make those sorts of decisions even without the continuing post-hurricane disruption. Another example is deaths due to the 2003 power outage in New York, where there were 6 officially-attributed deaths but a spike of 90 in the total death statistics.

Sometimes we want to look at specifically attributable cases: when snow shuts down the roads, we probably want to count the number of snow-caused crashes without subtracting the number of snow-prevented ones. But for natural disasters it’s probably the total excess deaths we want.

December 7, 2017

If this goes on…

atm

If you click through, things are less local and immediate: ATMs could be extinct in Australia within 30 years

Apparently

A projection of data from the Reserve Bank of Australia by finder.com.au has found ATMs could be a distant memory in Australia by 2036.

2036 is in 19 years, and 19 is less than thirty, so I suppose that counts as within 30 years. So how did they do the projection? There’s not much detail in the story and I couldn’t find any on finder.com.au.

The story says

According to finder.com.au, the number of ATM withdrawals per month has fallen from a high of 73 million in 2010 to just 47 million this year. If the trend continues at the same rate, ATM use will reach zero in three decades.

Now, I can fit a straight line to data. They teach you this in statistics. They often also teach you not to do it with just two points, but whatever
atm-projection

Ok, maybe finder.com.au had more data or more detailed data or something, but the information in the story is all we have, and it doesn’t really support either “2036” or “30 years”

I don’t know how long ATMs will last. And I don’t think finder.com.au does either. But they do know how to get a free mention in the Herald.

December 4, 2017

Compared to what?

From Stuff

Your summer pavlova costs more than 40 per cent more to make this year than it did 10 years ago – and commentators think that trend will continue.

That’s true, but prices now and prices ten years ago are in different currencies, and so shouldn’t just be compared like numbers.

Using the RBNZ inflation calculator, about half the apparent price increase is just currency conversion; a 2007 dollar is worth about 1.2 2017 dollars.

On top of that, incomes have changed over the past ten year. The median annual household income is up about 36%, so if pavlova is less affordable than in 2007 it’s mostly because of something like housing costs, not the price of cream and kiwifruit.

December 1, 2017

Briefly

  • Testing drug sniffer dogs: “The dogs are mainly used to confirm what we already suspect,” says Fulmer. “When the dogs come out, about 99 percent of the time we get an alert. And it’s because we already know what’s in the car; we just need that confirmation to help us out with that.”  At least with the biosecurity beagles at Auckland Airport, there’s no incentive on the handler’s part for false positives.
  • Security researcher Matt Blaze talks to US Congress about voting security: “The most reliable and well-understood method to achieve this is through an approach called risk-limiting audits. In a risk limiting audit, a statistically significant randomized sample of precincts have their paper  ballots manually counted by hand and the results compared with the electronic tally. …The effect of risk-limiting audits is not to eliminate software vulnerabilities, but to ensure that the integrity of the election outcome does not depend on the herculean task of securing every software component in the system.” 
  • The Grattan Institute (in Australian) has a report (PDF) on adverse events in hospitals: Strengthening safety statistics: how to make hospital safety data more useful.  Peter Davis has an opinion piece in the Dominion Post on what NZ could do
  • Statistical population genetics of New York rats: they stick to their neighbourhoods, just like the humans. Sarah Zhang in the Atlantic.
  • ProPublica found that Facebook won’t let you target ads based on race, or even on ethnicity — but it will let you target “African American” under “Behaviors”, sub-category “Multicultural Affinity”.   Facebook said “The rental housing ads purchased by ProPublica should have but did not trigger the extra review and certifications we put in place due to a technical failure.” The last five words of that sentence are interesting — they don’t actually add anything, but they kind of sound like they do.