Posts written by Atakohu Middleton (125)

avatar

Atakohu Middleton is an Auckland journalist with a keen interest in the way the media uses/abuses data. She happens to be married to a statistician.

January 22, 2019

Meet Lushi Cai, Statistics summer scholar

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Lushi Cai is working with Professor Chris Wild on iNZight, a free data visualisation and analysis software he developed.

There’s a Chinese saying that goes “Travel ten thousand miles, read ten thousand books.” And that’s just what summer scholar Lushi Cai, in the picture, is doing.

Originally from Guangzhou in southern China, Lushi had done a year of undergraduate study in China before she moved with her family to New Zealand three years ago. She embarked on a Bachelor of Science majoring in computer science and statistics at the University of Auckland, finishing her degree last year. This year, Lushi will be on an honours programme.

As a summer scholar, Lushi is working on the Department of Statistics’ data analysis package, iNZight. This is a free, R-based environment started by statistics education expert Professor Chris Wild to help high-school students quickly and easily explore data and understand some statistical ideas. However, iNZight has grown, and now extends to multivariable graphics, time series, and generalised linear modelling, including modelling of data from complex surveys. It is available in web and desktop versions.

Lushi’s summer scholarship involves implementing interactive web graphs for R- generated statistics plots and enhancing the web version of iNZight by adding an interactive plot function.  “Users tell iNZight what to do and what analysis output they want using iNZight’s gui (graphical user interface),” she explains. “They don’t need to know how to write code.

“However, key modules also provide users with the R code that iNZight used to produce their output. This is great for learning how to do things in R, and it also makes iNZight analyses reproducible by others.”

But improvements are needed, she adds: “Unfortunately, the R code automatically generated by iNZight is not easy for humans to read. So I’m writing an auto-formatter that converts messy R code into tidy R code that’s easy to read.”

Students are a critical part of the development of iNZight, says Chris Wild. “It’s a student-driven project, so most of the big-scale changes occur over the New Zealand summer period. At other times, we mostly work on small changes and bug fixes.”

Lushi enjoys problem-solving, so this sort of project is a natural fit. In addition, “my interest is analysing huge data and producing a direct way, such as tables and graphs, to explore the features. I believe this is a powerful skill and can be applied to every field in the real world”.

  • For general information on University of Auckland summer scholarships, click here.

 

September 24, 2018

And while we’re talking about Lotto …

Our very own Liza Bolton was summoned to TV show The Project last week to reveal how to minimise the chance of sharing a First Division Lotto win with heaps of other people.

The invite came after last Wednesday’s Lotto draw, where 40 people shared the first division prize, getting only $25,000 each rather than something with  one or two extra zeroes.

Liza had 90 seconds to share her top five tips – the first one is on the image.

Watch the clip here.

March 26, 2018

Ihaka Lecture Series 2018 – collected here for your viewing pleasure

The second annual edition of the Ihaka Lecture Series has just ended, and we are, once again, delighted with the turnout and engagement, in person and online. Our final speaker was Alberto Cairo, right, Knight Chair in Visual Journalism at the University of Miami, whose lecture on the dubious uses of data was thought-provoking and a bit worrying.

If you want to see how Trump supporters deluded themselves and misled others with graphics, it’s all laid bare here in Alberto’s lecture. And that this brand of Trumpery is not the only example of statistics willfully used to mislead – Alberto delivers a few other eye-openers. And some laughs, as well – he is a very entertaining and engaging speaker. By the way, it’s not all bad news – there is much useful and thoughtful work being done, and Alberto shows what that is and where.

Alberto’s lecture is accessible to all. He uses non-technical language, as and Alberto says, he’s not a statistician. So if you are teaching secondary-school statistics (or citizenship or social studies … ) this would be a really good resource for your students.

Also, Alberto was yesterday interviewed by Colin Peacock, the long-time host of Radio New Zealand’s Mediawatch, and it’s recommended listening. The pic Mediawatch ran of Alberto on its webpage was so nice, we stole it. Nice image, RNZ’s Claire Eastham-Farrelly!

Of course, we also had two other incomparable speakers: our own Associate Professor Paul Murrell, one of the movers and shakers behind R, on the BrailleR package, which generates text descriptions of R plots (watch here) and Monash Professor Dianne Cook, who described some simple tools for helping to decide if patterns you think you are seeing in the data are really there (watch here).

And … in breaking news, the theme of next year’s Ihaka Lecture Series is … machine learning! Speakers will be announced at a later date.

+ Useful link: The 2017 Ihaka Lecture Series.

February 19, 2018

Ihaka Lecture Series – live and live-streamed in March

The theme of this year’s Ihaka Lecture Series is “A thousand words: Visualising statistical data”. The distillation of data into an honest and compelling graphic is essential component of modern (data) science, and this year, we have three experts exploring different facets of data visualisation.

Each event begins at 6pm in the Large Chemistry Lecture Theatre, Building 301, 23 Symonds Street, Central Auckland, with drinks, nibbles and chat – just turn up – and the talks get underway at 6.30pm. Each one will be live-streamed – details will be on the info pages, the links to which are given below.

On March 7, Professor Dianne Cook from Monash University (right) looks at simple tools for helping to decide if the patterns you think you see in the data are really there. Details. Statschat interviewed Di last year about the woman behind the data work, and it was a very popular read. It’s here. Di’s website is here.

On March 14, Associate Professor Paul Murrell from the Department of Statistics, The University of Auckland (left) will embark on a daring statistical graphics journey featuring the BrailleR package for visually-impaired users, high-performance computing, te reo, and XKCD. Details. Paul was a student when R was being developed by Ross Ihaka and Robert Gentleman, and has been part of the R Core Development team since 1999.

On March 21, Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami (below right) teaches principles so we all become more critical and better informed readers of charts. This lecture is non-technical – if you have any journalist friends, let them know. Details. His website is here.

The series is named after Ross Ihaka, Associate Professor in the Department of Statistics at the  University of Auckland. Ross, along with Robert Gentleman, co-created R – a statistical programming language now used by the majority of the world’s practicing statisticians. It is hard to over-emphasise the importance of Ross’s contribution to our field, so we named this lecture series in his honour to recognise his work and contributions to our field in perpetuity.

 

 

December 15, 2017

Jenny Bryan: “You need a huge tolerance for ambiguity”

Jenny Bryan @JennyBryan was one of several leading women in data science who attended this week’s joint conference of the New Zealand Statistical Association, the International Association of Statistical Computing (Asian Regional Section) and the Operations Research Society of New Zealand at the University of Auckland, so we couldn’t miss the opportunity to talk with her (Jenny’s conference presentation, titled “Zen and the aRt of workflow maintenance”, is here). A brief bio: Jenny is a software engineer at RStudio while on leave from her role as Associate Professor in Statistics at the University of British Columbia, where she was a biostatistician. Jenny serves in leadership positions with rOpenSci and Forwards and is a member of The R Foundation. She takes special delight in eliminating the small agonies of data analysis.

Statschat: When did you first encounter statistics as a young person? Jenny: I was an economics major which had exactly one required statistics paper, which I took, and then continued to try and make that degree as un-quantitative as I possibly could. I had started out thinking I would major in some form of engineering, and therefore was taking math and physics and the technical track.

I was one of very few women in the course, and the culture of the course was to pull an all-nighter once a week [to do the weekly problem set]. The average mark on the exam would be 20 out of 100, and I was mentally not prepared for this type of sort of stamina-driven culture.

Was it a macho culture? That’s how it felt to me, and you needed enough innate confidence to never worry about the fact that you were getting marks you had never seen before in your life – everyone failed miserably all the time. After the first semester or two of this, I decided it wasn’t for me and declared my major to be German literature, which I saw through. But in the last two years at university, I realised I needed to be employable when I graduated, so I added economics as a means to making sure I could make a living later.

I worked as a management consultant for a couple of years and that’s where I learned that I was actually at my happiest when they locked in a room by myself with a huge spreadsheet and I had some data task ahead of me … and so then I gradually worked my way back to what I think I’m really good at.  

Did you pursue statistics qualifications? I did. After my two years of management consulting, the normal track would be to be sent off to business school. But thanks to what I learned about myself, I was pretty sure that wasn’t the right track for me. But I had learned how to give talks, how to extract questions from people and go and make it quantitative and then translate my solution back into their language. So the management consulting experience was super-useful.

At that point, I had met my husband, and I followed him to his first postdoc with no particular plans. He’s a mathematician – he knew he wanted to be a mathematician when he was 6. I never had that kind of certainty about what I was meant to do! It took me a lot longer to figure it out.

So I followed him, and basically played a lot of tennis at first (laughs) while were living in Southern California … I decided some form of statistics would be ideal for me, but I didn’t have enough of a math background to take the specialised math exams in the US, called the GREs [Graduate Record Examinations] that a lot of statistics departments want to see. So I started taking as many prerequisites as I could at the university where he was doing his postdoc. I did well and started working as a teaching assistant in these classes as well.

Then we moved together, two years later, for him to start his second postdoc and for me to start biostatistics grad school. Also during this time, I supported myself doing fancy Excel work as a temp … so I did a PhD in Biostatistics at Berkeley in five years – the first two years are the masters, and three years of writing the thesis.

What’s your academic career path been since then? I got my job at University of British Columbia before I graduated, and I was there until I went on leave earlier this year. I’ve since been working in Hadley Wickham’s group at RStudio. My title is software engineer, which I still find a bit peculiar.

Why? Because I feel I should have more formal training in engineering to have that title, but I’m getting more comfortable with it.

What’s the essence of your role there? I spend about two-thirds of my effort on package development and package maintenance. Hadley is starting to gradually give maintainership of his packages to other people … so I took over readxl. I already had an existing line of work in making R talk to Google APIs [application programming interface], so I worked with an intern this summer and we created a package from scratch so that you can use Google Drive from R. Now I’m revisiting some general tools for authenticating with Google APIs, and I have another package that talks to Google spreadsheets. I also do quite a bit of talking and teaching.

You put a lot of your work on the internet. Why do you feel that is important to share it this way? I decided this was how I was going to interpret what it meant to be a scholar. Several years ago, I decided that teaching people about the process of data analysis was super-important to me, and was being completely undertaught, and I was going to dedicate a lot of my time to it. Luckily, I already had tenure at that point, but it still looks a bit like career suicide to make this decision, because it means that you’re not producing conventional statistical outputs like methodological papers. I also felt like putting my stuff out there and having a public course webpage and pushing things out would be my defence against [any suggestion] that I wasn’t doing anything.

You’re clearly not satisfied that the current academic system is serving the subject well. Not at all! We have a really outdated notion that only publications matter, and publications where there’s novel methodology. I think that’s leaving a ton of value on the table – making sure that statistical methods that exist are actually used, or used correctly. But the field is not set up to reward that – the majority of papers are not widely read and cited, and many of these methods are not used or implemented in any practical way …. it’s been enshrined that academic papers are what counts, but they’re not a directly consumable good by society. We need knowledge-translation activity as well.

So you’re rebelling. Well, I felt that the only way you could do it was to start doing the things you thought were valuable. Being able to put your course material online, to have a dialogue with people in your field on Twitter … you can finally remove a lot of these gatekeepers from your life. They can keep doing their thing, but I know people care and read this stuff. Since I was able to wait until I had security of employment, I decided that if that meant I didn’t go from associate to full [professor], I could live with that. It’s not that my department isn’t [supportive] – it’s either neutral or positive on all this. But it’s true that everyone else I was hired with is a full professor and I’m not.

Does that bug you? Yes and no. I think I could have pushed harder. But every time you push on these things, you’re basically asked, “Well, can you make what you do look more like a statistics publication? Each package that you write, can you write a stats paper around it?” and I’ve decided the answer is, “No. Can we agree that is not a helpful way to evaluate this work? The only reason to repackage it in that way is to check some box.”

Academics are becoming increasingly dissatisfied with academic publishing structures. Do you think that perhaps data scientists might take the lead in dismantling structures that aren’t helping the subject? Maybe, and I think things are changing. But I decided that it’s like turning the Titanic and it’s not going to happen on a time-scale consistent with my career.  I can’t wait for academia to gradually reshape itself.

Is that one of the reasons you went off to RStudio? Oh, absolutely. I feel the things I do are tolerated in academia, and often found very useful, [but that said], I lost my grant funding the more applied I became. It’s harder to get promoted. You’re pressured to sell your work as something it’s not, just because that’s what the status quo rewards. Working at RStudio, I’m actually allowed to say what I do is what I do, and be proud of it, and be told that you are excellent at it, which is not currently possible in academic statistics.

So tell me about your typical day, working for RStudio. It’s a remote company. There is an office in Boston and a large enough group in Seattle that they rent a space, but the rest of us are on our own. So it’s just me alone at home working on my projects. We use Slack as a communication channel; the team I’m on maintains two channels for two separate groups of packages. We might have a group conversation going and it can be completely silent for three days, or we can have 100 messages in a morning. It really depends when someone raises an issue that other people care about, or can help out with. And then, I have private one-off conversations with Hadley or other members of the group, and similarly, they can be very quiet or suddenly light up.

Who do you live with? My husband’s a professor, so he’s mostly on campus but sometimes he’s around – we both like working at home and being alone together. The kids are all at home; they go to school from 9am until 3pm or 4pm. My oldest is 14 and I have twins who are about to turn 12.

So how do you manage work-life balance, given that you work from home? Well, I work when they are not there, then I try to work from 3pm to 6pm, or 4pm to 6pm, with mixed success, I would say. Then there are a couple of hours which are explicitly about driving people here and there. I do a second shift from 9pm to 1am or 2am.

Are you a night owl? Yeah, which I don’t love, but that’s just how things are in my life right now. I have to do it that way. I have one productive shift while the children are at school, then one productive shift after they go to bed.

Let’s talk about women in data science. I have the impression that maths remains male-dominated and that statistics is less so, but that data science appeals to women and that the numbers are quite good. What’s your take on that?  The reason I liked statistics, and particularly liked applied statistics, is I was never drawn to math for maths’ sake, or the inherent beauty of math. I enjoyed doing it in the service of some other thing that I care about … I think it’s possible that there’s something about me that’s typical of other women, where having that external motivation is what makes you interested in, or willing to do, the math and the programming. For its own sake, it never really appealed to me that much. Programming appeals to me more on its own than math does. Programming actually can motivate me just because I love the orderliness of it and accomplishing these little concrete tasks – I love checking lists (laughs) and being able to check my work and know that it is correct … When you combine it with, “This is going to enable us to answer some question”, then it’s really irresistible.

So it’s the real-world nature of it that is really appealing to you. Yeah – I care about that a lot.

What skills and attributes make a good data scientist? I think being naturally curious, doing something for the sake of answering the question versus a “will-this-be-in-the-test?” mentality – just trying to do the minimum.

You need a huge tolerance for ambiguity. This is a quality I notice that we’re spending a lot of time on in our Master of Data Science programme at UBC. Half the students have worked before and about half are straight out of undergrad, and the questions they ask us are so different. The people straight out of undergrad school expect everything to be precisely formulated, and the people who’ve worked get it, that you’re never going to understand every last thing; you’re never going to be given totally explicit instructions. Figuring out what you should be doing is part of your job. So the sooner you develop this tolerance for ambiguity [the better] – that makes you very successful, instead of waiting around to be given an incredibly precise set of instructions. Part of your job is to make that set of instructions.

How much room for creativity is there in data science?  I think there’s a ton. There’s almost never one right answer – there’s a large set of reasonable answers that reasonable people would agree are useful ways of looking at it. I think there’s huge scope to be creative. I also think being organised and pleased by order frequently makes this job more satisfying. People come to you with messy questions and messy data, and part of what you’re doing is this sort of data therapy, helping them organise their thoughts: “What is your actual question? Can the data you have actually answer that question? What’s the closest we can get?” Do that, then package it nicely, you do feel like you’ve reduced entropy! It feels really good.

You work from home and that suits you, but not every woman is able to do that.  What needs to change to help women scientists’ progress through life and career, balancing what they need to balance? I don’t how specific this is to data science, but three things were helpful to me. One is I live in Canada, where we have serious maternity leave – you can take up to a year, and because that’s what the Government makes possible, that means it’s normal. In both cases, I took between six and nine months – I was begging to come back before a year! But having a humane amount of time for maternity leave is important.

Also, what’s typical in Canada, and what and UBC does, is that they pause any sort of career clock for a reasonable amount of time. So every time I went on maternity leave it added one year to my tenure clock.

You don’t end up out of synch with people who hadn’t been away. Yeah. It [parenthood] still slows your career down, but this helps immensely. So there are the structural policies.

Secondly, I do have a really supportive spouse. I feel like maybe I was lead parent when the kids were little, but since I made this career pivot and became much more interested in my work, he’s really taken the lead. I feel that there were many years where I was the primary parent organising the household, and now it’s really the other way around … that’s huge.

Third, I’m in my mid-late 40s now and I’m embarking on what feels to me like a second career; certainly, a second distinct part of my career and focusing more on software development. I think you also have to be willing to accept that women’s careers might unfold on a different time-scale. You might lose a few years in your 30s to having little kids … but you often find awards that are for people within five years of their PhD or for young investigators and they assume that you don’t have all this other stuff going on. I think another thing is [employers] being willing to realise that someone can still be effective, or haven’t reached their peak, in their 40s. The time-frame on which all of this happens needs to be adjusted. You need to be flexible about that.

Read more about Jenny Bryan:

Her academic page

A profile by rOpenSci.org

Di Cook: “I had advantages early on, and I feel like I need to pay that back”

Australian Di Cook @visnut was one of several leading women in data science who attended this week’s joint conference of the New Zealand Statistical Association, the International Association oDi Cookf Statistical Computing (Asian Regional Section) and the Operations Research Society of New Zealand at the University of Auckland, so we couldn’t miss the opportunity to talk with her. A brief bio: Di is a world leader in data visu­al­isa­tion and well-known for her work on inter­ac­tive graph­ics. She is Professor of Business Analytics in the Department of Econometrics and Business Statistics at Monash University. She’s a Fellow of the American Statistical Association, elected member of the R Foundation and the Editor of the Journal of Computational and Graphical Statistics. Her research lies in data science, data visualisation, exploratory data analysis, data mining, high-dimensional methods and statistical computing.

Statschat: When did you first encounter statistics? Di: It was in my undergraduate degree. I studied mathematics with a plan to do math teaching. Statistics was one of the areas of mathematics that I could major in other than pure, or applied, mathematics. There was an extremely good female professor at the University of New England, Eve Bofinger, and I was drawn to some of the methods she was teaching, and that led me into statistics.

What was your career path after that?  I taught math at high school for about three months, then I had an offer from the Australian National University to go there as a research assistant, and that seemed a better fit. As a research assistant, I got to learn a lot more things, particularly computing. Computing, I think, is a critical aspect of data science today.

I spent a few years doing that and then realised I’d really like to make art, because some of the research-assistant work I was doing was computer graphics for data online. It fed into my art instincts from teenage years, so I spent some time as an artist before finding a graduate programme in statistics in the US that focused on data visualisation.

What sort of art do you do? I was painting – I haven’t done any for a long, long time, since I finished my PhD; it’s been too busy.

So your creative pursuits have fed into your career. Yeah – seeing that I could do data visualisation as a part of the statistics allowed me to realise that I could do a higher degree in stats; that merged my interests very well.

Where did you do your PhD? At Rutgers University in New Jersey.

You spent 22 years at Iowa State University in the US, and moved to Monash in Australia in 2015. What are your major projects there? I have a lot of projects. One of them is with Tennis Australia; we’ve been looking at tennis serves. So we have Hawk-Eye trajectory data and we visualise the tennis serves and look at how the players are different or similar.

That’s very cool – how’s that for applied statistics. Yeah, it’s fantastic, isn’t it. We’re also looking at face recognition in tennis video, to be able to detect the face through broadcast video, so that we can monitor emotions throughout a match and see how that affects performance.

We’re also looking at pedestrian sensor data, that comes from a city of Melbourne (almost live) feed. One of my PhD students, Earo, has a new type of plot called a calendar plot; you make your data plots into a calendar format so that you can study things relative to holidays, and put it really on a human pattern basis.

Describe a typical day at work at Monash. We have a lot of meetings with students, so I would meet up with two or three students – PhD students or postdocs or research assistants – on projects that we’re working with, and meet up with other faculty. On some days I’m teaching data science classes to around 200 students. We often just go for a coffee with colleagues. We also play ping-pong on the conference table! I’ve got a good group of colleagues who play tennis, so we play tennis together.

It sounds very collegial. You’re a prominent woman in data science, and the field seems to appeal to women as a career path. Do have any thoughts on that? I haven’t really looked at those numbers … but honestly, I think there’s too big of an emphasis on gender differences, and they’re not real when you look at the metrics. It’s just a perception. But one of the things I notice with the women that I work with is that they are interested in solving problems, and having an outcome of their work that makes life better for others. And that’s one thing that data science offers that pure statistics research is a bit removed from.

Do you have a family? I have one son. I moved to Monash after he graduated high school. He went off to college in the US, while I moved halfway across the globe, which he was quite happy about. He visits during the holidays, and last American summer found an internship at Monash University.

When he was small, how did you navigate work and life? It’s really difficult. I can’t imagine how single women do that – you need to have some sort of support mechanism. Day-care is amazing – and however much you spend on day-care, it’s worth it. And also partly because I think young kids early on really get a huge amount of benefit from being in the social mix of other kids the same age. He was in day-care from three months, part-time, and even at five months, if we were away for a week, when he’d get back, the other babies were over the moon – they recognised each other. I hadn’t realised how early on that socialisation happens.

So you weren’t concerned about day-care at all. Some women get tied up in knots about putting their kids in day-care. I know – there’s this thing about guilt. It is actually the best environment – they [pre-school educators] can do a much better job than me. If my time pressure is relieved by not having to have every moment dealing with all the stuff you have to deal with young kids … he’s come out as being a very sociable child and that he learnt from early on. Guaranteed when you’ve got the most important meeting, and your husband has a most important meeting at exactly the same time, that’ll be the time your kid gets sick. So you have to have a backup.

So what advice do you give other academic mums? Don’t stress – there are ways around. And the meeting you think is most important doesn’t have to be the most important. You just juggle everything you have as well as you can, and there are ways around any hurdle or hiccup. Just keep out there. It’s really important for other younger women to see women in senior roles.

Are universities doing the necessary to help women make the most of their talents in data science? I think it’s still a struggle. I think there’s been bureaucratic pushes for gender equality, which is really how I actually got an academic position in the first place in the US.

How so? Equal opportunity. Many statistics departments had no women, and it was a cultural shift in the early 1990s that many university administrations were forcing departments to hire women … or otherwise they couldn’t hire … if they [universities] were doing it well, they were not putting women in that situation of thinking, “Oh I was only hired because I was a woman”. They were doing it in the sense of making sure that women realised that they were talented, and wanted for their  talents, not just because of the administration push. But that wasn’t universal.

I thought things have been solved, but it’s not. Time and time again women are evaluated differently at promotion, and in classroom evaluations, they are not on average [rated to be] as good as the men, and that’s been shown again and again and again. So the thing is, don’t get put off by that; you will sometimes need to fight for your promotions and have people willing to fight for you.

Systemically, things are still not weighted fairly between men and women. It’s not. I’ve just finished studying some of the research-grant rates in Australia and the number given to women faculty are pitiful, from both the Australian Research Council and the National Health and Medical Research Council, which is the health sciences. That impacts whether women can get through to those higher ranks. That’s my next fight.

Would you see yourself as a crusader? How do you define yourself in exposing these inequalities? We’ve seen a lot of things [around sex, privilege and power discussed] in public in these last few months, with the sex scandals in Hollywood.  I’ve seen that all through my career in academia. I think we, hopefully, are on a cusp where the playing field for recognising talent among women becomes more level … I had advantages early on, and I feel like I need to pay that back.

I wouldn’t say I’m a crusader; I’m saying I see where we’ve come from, in terms of generations of women in my family, and where we are now, and we’ve come a long, long way. I’ve had so many more opportunities than my mum and my grandmother … I feel like I’ve got a responsibility to those generations to keep it moving in the right direction.

What advice would you give young women looking at a career in data science? What skills and attributes do they need to develop? Get onto the publicly available software – free software like R and Python – and get to know them. These are hugely powerful, and they give you power. There’s a number of courses you can do for free to help learn how to work with data.

Any particular courses that you would recommend? There’s Data Camp and Corsera and Software Carpentry, among others. Work with data. Play. Extract somebody’s tweets and analyse the text – there are really good resources for that. Pull data from the government web pages – they have lots of information. The New Zealand Herald has lots of data available. Just get comfortable finding data, making plots of it, and seeing whether it matches up what the media is reporting about a problem. This is the sort of power you can get over your life if you can make decisions yourself, rather than being fed decisions.

Read more about Di Cook:

Her academic page

Wikipedia

Another Q & A

February 2, 2017

CensusAt School kicks off next Tuesday

As many of you may already know, the Department of Statistics runs the magnificent, biennial CensusAtSchool TataurangaKiTeKura, a national statistics literacy programme in schools supported by the Ministry of Education and Statistics New Zealand. Students aged 9 to 18 (Year 5 to Year 13) use digital devices to answer 35 online questions in English or te reo Māori about their lives and opinions. The aim is to turn them into data detectives – and turn them on to the value of statistics in everyday life.

Pakuranga College visit by Minister of Statistics and local MP Maurice Williamson, to see Census At School 2013 in action with teacher Priscilla Allan's Year 9 digital maths class, along with co-directors of the programme from The University of Auckland, on Monday 6 May 2013, Auckland, New Zealand.  Photo: Stephen Barker/Barker Photography. ©The University of Auckland.

Photo: Stephen Barker.  © The University of Auckland.

The latest edition of CAS starts next Tuesday, February 7, after the Waitangi Day holiday, and we’re hoping to get more than 50,000 Kiwi students taking part, which would be a record since CAS started in Aotearoa in 2003. Registrations have been open for a few weeks and are piling in, and I can see that so far we have 780 teachers from 507 Māori-language and English-medium schools registered – and there’s also a school from the Cook Islands, Tereora College. Check out if your local school is involved here.

CAS started as a pilot programme here, in 1990, run by Sharleen Forbes. As an international educational project, it started in the UK in 2000, and now runs in the UK, New Zealand, Ireland, Australia, Canada, South Africa, Japan, and the US. Good ole NZ, still punching above its weight in stats education.

There are questions common to all the censuses so comparisons can be made, but there are locally-specific questions as well – you can see the list of questions here. This year, we’re asking students about topics such as whether they get pocket money, and how much; whether there is there a limit on their screen time after school; and if anything in their lunchbox that day had been grown at home. In each census, students also carry out practical activities such as weighing the laptops and tablets they take to school and measuring each other’s heights, as in the picture of these Pakuranga College students. From mid-June, the data will be released for teachers to use in the classroom.

As this census is the only national picture of how kids are feeling, what they’re thinking and what they’re doing, journalists love the stories that flow from the results. The publicity isn’t only fascinating – it helps raise awareness of the value of statistics to everyday life. With any luck, some of the kids who do this year’s census will end up being our statisticians of tomorrow.

November 23, 2016

Indigenous data – why is it important?

andrew-sporle tahu-kukutai-240712In a data-driven world, indigenous peoples are becoming increasingly concerned about who owns and represents statistics about indigenous people: that is, who has access to the data, its cultural integrity, and how people’s privacy and autonomy is protected.

Not only do governments collect data about their citizens, but so, too, do indigenous peoples about themselves – just think of the data that iwi need to collect about their own people in this post-settlement era. As an example, I’m a registered member of Waikato-Tainui. The central administration knows six or so generations of my whakapapa because becoming registered means putting your links on paper that a kaumatua then signs off. It knows my home marae and all sorts of personal details such as where I live and my birth date. As I have been the privileged recipient of educational scholarships from the iwi, it also knows my academic record and quite a lot of personal stuff about my goals and aspirations.

So why is this important? Indigenous people have historically had a problematic relationship with researchers, academics and other data collectors. Researcher Andrew Sporle, pictured at right (Rangitāne, Ngāti Apa, Te Rārawa) recently told me that “From a Māori perspective, we were all too often the researched, not the researchers, and Māori realities were often portrayed as a strange and inferior ‘other’. Indigenous peoples are asserting the right to govern and protect the data that are so important to our development. We cannot afford to lose control of data about us.”

Data, he added, is a “highly valuable strategic asset” for Māori development. “In the age of big data, Māori want access to data to support our decision‐making and to be involved when big data is used to make decisions about us.”

In this field, things have been moving fast of late, and New Zealander statisticians are among the leaders.  Andrew and Tahu Kukutai pictured left (Ngāti Maniapoto, Te Aupōuri), Associate Professor at the Institute of Demographic and Economic Analysis, University of Waikato, are among the founding members of Te Mana Raraunga (the Māori Data Sovereignty Network), which was set up last year to assert Māori rights and interests in relation to data.

The group’s guiding motto is “He whenua hou, Te Ao Raraunga; Te Ao Raraunga, He whenua hou”, or “Data is a new world, a world of opportunity.”  It advocates “for the development of capacity and capability across the Māori data ecosystem, including data rights and interests, data governance, data storage and security, and data access and control”.

Andrew and Tahu attended last month’s  Indigenous Open Data Summit in Madrid, Spain, alongside independent statisticians Kirikowhai Mikaere (Tūhourangi, Ngāti Whakaue) and James Hudson (Ngāti Pukeko, Ngāti Awa, Ngāi Tai, Tūhoe), a researcher for Auckland Council’s Independent Māori Statutory Board. The summit, a first of its kind, provided a forum to discuss what action was being taken to protect the use of data about indigenous peoples.

Tahu and John Taylor, Emeritus Professor at the Centre for Aboriginal Economic Policy Research at the Australian National University,  have edited the just-released first book on indigenous data, titled Indigenous Data Sovereignty – Towards an Agenda, published by ANU Press.

It’s free to download and provides a comprehensive overview of why indigenous oversight of data is important, focusing largely on Australasia. It’s an interesting read and provides a perspective on data that has been missing for too long.

The local contributors include Darin Bishop (Ngāruahine, Taranaki), team leader of organisational knowledge at Te Puni Kōkiri, the Ministry of Māori Development; Dickie Farrar (Whakatōhea, Te Whānau ā Apanui, Te Aitanga ā Mahaki), CEO of the Whakatōhea Māori Trust Board;  James Hudson, mentioned above; Maui Hudson (Ngāruahine, Te Mahurehure, Whakatōhea), Associate Professor in the Faculty of Māori and Indigenous Studies at the University of Waikato; GP Rawiri Jansen (Ngati Hinerangi); Lesley McLean (Whakatōhea, Te Whānau ā Apanui), tribal database coordinator for the Whakatōhea Māori Trust Board; and leading demographer Ian Pool, Emeritus Professor at Waikato University.

 

 

August 23, 2016

So where did the word ‘statistics’ come from?

Yes, the book about the history of statistics has been written, in case you were wondering. A History of Statistics in New Zealand was published in 1999, with funding from the New Zealand Statistical Association and the Lotteries Commission of New Zealand. H S (Stan) Roberts edited the history, and wrote substantial sections. It’s now available for free download here – the usual caveats about attribution apply. And it opens by tracing the history and usage of the word statistics:

“Statistics”, like most words, is continually changing its meaning. In order to find the meaning of a word we tend to reach for a dictionary, but dictionaries do not so much “define” the meanings of words, but rather give their current usages, together with examples. Following are examples relating to statistics taken from the 1933 Oxford English Dictionary (13 Vols). Note that in each entry the date indicates the first usage found.

Statism: Subservience to political expediency in religious matters. 1609 – “Religion turned into Statisme will soon prooue Atheisme.”

Statist: One skilled in state affairs, one having political knowledge, power, or influence; a politician, statesman. Very common in 17th c. 1584 – “When he plais the Statist, wringing veri unlukkili some of Machiavels Avioxmes to serve his Purpos then indeed; then he tryumphes.”

Statistical: 1. Of, or pertaining to statistics, consisting or founded on collections of numerical facts, esp. with reference to economic, sanitary, and vital conditions. 1787 “The work (by Zimmerman) before us is properly statistical. It consists of different tables, containing a general comparative view of the forces, the government, the extent and population of the different kingdoms of Europe.” 2: Of a writer, etc: Dealing with statistics. 1787 – “Some respectable statistical writers.”

Statistician: One versed or engaged in collecting and tabulating statistics. 1825 – “The object of the statistician is to describe the condition of a particular country at a particular period.”

Statistics: In early use, that branch of political science, dealing with the collection, classification, and discussion of facts (especially of a numerical kind), bearing on the condition of a state or community. In recent use, the department of study that has for its object the collection and arrangement of numerical facts or data, whether relating to human affairs or to natural phenomena. 1787 – Zimmerman – “This science distinguished by the newly-coined name of Statistics, is become a favourite in Germany.”

Statistic: The earliest known occurrence of the word seems to be in the title of the satirical work “Microscopium Statisticum”, by Helenus Politanus, Frankfort (1672). Here the sense is prob. “pertaining to statists or to statecraft”.

The Concise Oxford Dictionary (1976) gives us two modern usages.

Statistics: 1. Numerical facts systematically collected 2: Science of collecting, classifying and using statistics. The first verse of a poem composed in 1799 by William Wordsworth, and entitled, “A Poet’s Epitaph”, successfully clarifies this difficult matter.

Art thou a Statist in the van
Of public conflicts trained and bred?
First learn to love one living man;
Then may’st thou think upon the dead.

 

August 15, 2016

New Zealand at the top of the (per-capita) table

On a medals-per-capita basis, New Zealand now ranks at the top of the table with two gold medals and six  silver at the Rio Olympics, Statistics NZ said today.

With eight medals overall at the half way stage at Rio, New Zealand is the highest performing country, with the equivalent of 1.77 medals for every one million people.

Slovenia is second on 1.45 medals for every one million people. Hungary and Denmark are third and fourth respectively, with Fiji coming in fifth based on its one gold for the men’s rugby sevens win.

Capture

However, on a per-capita basis for gold medals alone, Fiji tops the table, with its one gold for a population of just under 900,000. On that basis, New Zealand’s two gold medals leave it in sixth place, with a population of more than 4.5 million.

During the weekend, Mahe Drysdale’s single sculls gold medal was the high point for the New Zealand team.

On Saturday, New Zealand won two silver medals, for shot-putter Valerie Adams and at the rowing where Genevieve Behrent and Rebecca Scown also picked up a medal in the pair.

See the SNZ data here: http://www.stats.govt.nz/browse_for_stats/population/estimates_and_projections/olympics-2016.aspx#tables