Posts written by Atakohu Middleton (125)

avatar

Atakohu Middleton is an Auckland journalist with a keen interest in the way the media uses/abuses data. She happens to be married to a statistician.

November 12, 2019

The Bird of the Year race as a data visualisation!

The Bird of the Year contest, run by wildlife advocate Forest & Bird, asks the public to vote for their favourite New Zealand native bird. People get very excited by this, with campaigns coalescing around particular birds and much trash-talk between rival camps.

This year, the race was between the kākāpō and the hoiho or yellow-eyed penguin. Find out which bird won and what voting looked like on the way there in a neat little data visualisation by Yvan Richard of Dragonfly Data Science here. Read a story about this year’s race here. #BirdoftheYear

 

March 29, 2019

Ihaka Lectures – videos for your viewing pleasure

 

We’re three weeks into the month-long Ihaka Lecture Series, and it has been well received – thank you to those who have turned up in person and online.

Our final speaker, Robert Tibshirani, right, is up on Weds April 3 at the University of Auckland (details here). Robert is Professor of Statistics and Biomedical Data Science at Stanford University.

He is best known for proposing the ‘lasso’, a sparse regression estimator, and describing its relationship to the idea of boosting in supervised classification. He will talk about modern sparse supervised learning approaches that extend the lasso.

 

In the meantime, you might like to check out the films of the last three speakers. First up on March 13 was by Bernhard Pfahringer, left, who is Professor of Computer Science at the University of Waikato.

He is a member of the Weka project, New Zealand’s other famous open-source data science contribution, and here talks about the design and development of Weka and more recent projects.

 

 

 

Next was supposed to be JJ Allaire, the founder and CEO of RStudio, and the author of R interfaces to Tensorflow and Keras. However, ill-health prevented him coming, and our very own Professor Thomas Lumley stepped in.

Thomas talked entertainingly about deep learning, in particular how deep convolutional nets are structured and how they can be remarkably effective, but can also fail, as he puts it, “in remarkably alien ways”.

 

Following was Dr Kristian Lum, Lead Statistician at the Human Rights Data Analysis Group. Her research has concretely demonstrated the potential for machine learning-based predictive policing models to reinforce and, in some cases, amplify historical racial biases in law enforcement.

She talked about algorithmic fairness, and about ways in which policy, rather than data science, influences the development of these models and their choice over non-algorithmic approaches.

 

 

March 27, 2019

The dangers in a world built on data about men

Women all know about the toilet queue in the intermission at concerts – same-sized bathrooms for men and women does not equal efficiency. Women who have ever stood and waited in a long line for the loo while the men come and go with speed – and I think I can say that this is about, roughly, give or take, 100% of us – roll our eyes and laugh about this as we wait. But the anecdote reveals an uncomfortable truth, says Caroline Criado Perez in her book Invisible Women: Exposing Data Bias in a World Designed for Men. Design and services that takes the average male or the needs of the average male as the norm – as is the case with car-crash test dummies and stab-proof vests, among other things – are potentially deadly. The Guardian has excerpted a section of her book and it’s a sobering read. Recommended.

And while we are on the subject of a world designed for data about men, NASA has cancelled the first all-women spacewalk due to a spacesuit size issue.

February 8, 2019

Meet Summer Scholar Larisa Morales Soto

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Larisa Morales Soto, below, is working with Dr Beatrix Jones on a project exploring how the dietary patterns of New Zealand children change during their childhood and the transition to adolescence. 

Unlike most of the Department of Statistics’ summer scholars, Larisa isn’t studying locally. Summer scholarship are open to anyone tertiary student who is appropriately qualified, and Larisa, who is in her third year studying genomic science at the National Autonomous University of Mexico at Morelos in south central Mexico, leapt at the chance to gain new experiences overseas.

“What first motivated my search is that this time of the year would be winter in the northern hemisphere, and research internship programs are not very common in this season,” says Larisa, whose study combines biology, computer science and statistics to answer questions in life sciences.

“What finally brought me to New Zealand was the high academic quality and international presence of The University of Auckland. Also, the summer programme would give me the chance to visit the country and get completely immersed in the culture, something I wouldn’t have been able to do without the scholarship.”

The work she is doing looks at how the overall dietary patterns of New Zealand kids change during their childhood and the transition to adolescence. “During early life stages, children’s parents determine their food intake,” she explains, “but as they grow up, they start making decisions on which foods they want to eat, and their previous diet and other external factors can influence this decision-making process.”

The research hopes to shed light on the complex relationship between diet, health and disease during an individual’s lifespan, understanding how different factors help to establish dietary patterns.

Being in New Zealand has brought Larisa personal and academic benefits: “This experience is having a huge impact on my professional training. But also, I feel that it’s making me grow on the personal level as well, because being alone and very far from my home country is a big step. Being here has changed the perspective I had of New Zealand – I’ve been able to see the greatness of the country in terms of natural resources, social culture, economy and politics.”

In her down time, Larisa has been using the university sports and recreation centre and the library, as well as visiting parks, museums and other attractions in the city and Hauraki Gulf. She also fitted in a quick trip to the South Island with a friend she made here.

This won’t be the last we see of her, says Larisa: “I am definitely coming back in the future.”

  • For general information on University of Auckland summer scholarships, click here.

 

 

February 7, 2019

Meet Summer Scholar Yongshi Deng

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Yongshi Deng, below, is working with communications company Vodafone New Zealand on a project analysing and predicting customer behaviour. 

Yongshi, who has a BSc in Mathematics and Statistics and an Honours in Statistics from the University of Auckland, is spending seven weeks working within Vodafone’s Big Data and Analytics team.

Her research focuses on analysing and modelling opt-out behaviour among Vodafone’s fixed line customers – those who have purchased broadband, landline telephone and television services. She is using a combination of data on what customers do when they use these services, network data and information from Vodafone’s call centre.

It’s a big job, Yongshi says, but she is relishing it: “I enjoy this project a lot, as I can put my knowledge I learn from my degrees into practice.” There are plenty of problems to solve: “The major challenge is data cleaning, since big, real-world datasets can be very messy. There are millions of missing values that need to be handled.”

Another challenge is variable selection. “The dataset I am currently working on has more than 120 variables, so this makes dimension reduction indispensable,” Yongshi explains. It’s critical that she chooses a good combination of variables to build models that can generalise well for unseen data. This step, she says, is based not only on statistical tests but also on domain knowledge, which she gets from her colleagues at Vodafone.

Yongshi’s supervisor at Vodafone, Neel Sengupta, says that having students in-house brings benefits to both parties. “They get to see what business data looks like and the sheer scale of it. The benefit for us is that we get to see the advancements coming out of universities that we might not necessarily see in a commercial set-up.”

Yongshi’s supervisor in the Department of Statistics, Ciprian Giurcaneanu, agrees that the biggest benefit to students of such work experience is that ”they get in contact with the real world. This allows them to see how useful in practice are the techniques that they have learned in our department.” They also have to fend for themselves: “The lecturer “who knows everything” is not there, and the students have to find their own answers to their questions.”

Yongshi is originally from Dongguan, China. This year, she plans to pursue a PhD at the University of Auckland, and already has a good idea of the field she wants to research: “I am particularly interested in applying machine learning techniques to solve real-world problems.”

Yongshi says that statistics is a “fantastic subject” that not only helps her explore the world, but keeps her motivated and engaged. She particularly appreciates the R programming skills that she has learned in the Department of Statistics. “The department provides a wide range of statistical courses and R is integrated in most courses, which has equipped me with the skill and knowledge I’ll need for further study.”

  • For general information on University of Auckland summer scholarships, click here.

 

February 2, 2019

Meet Simon Goodwin, Statistics summer scholar

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Simon Goodwin, below, is working with Dr Jesse Goodman on random graph dynamics and hitting times.

Simon’s summer scholarship is related to the study of random graphs, looking at how to investigate networks that look as if they are random or pseudo-random, like social networks, family trees or the global flight network.  His task in particular is looking at the nodes in these structures that are hard to reach by moving randomly, and what this means for the structure of the graph as a whole.

You can conceptualise it like this: Produce a random graph by connecting pairs of vertices uniformly at random. Then run a random walk on this random graph: at each step, move to a uniformly chosen neighbour of the current position. The hitting time is the number of steps needed to reach a particular target vertex, and it varies in a particular way depending on the size of the random graph.

Simon’s work looks at the effect of changing the random graph. Between each random walk step, he might “rewire” some edges: pick a fraction of edges, disconnect the vertices on either side, then randomly reconnect those vertices to see if these graph dynamics make it faster (or slower) to reach the target vertex.

“Looking at the structure of these random theoretical objects we can learn about vast real-world networks that have no clearly apparent structure,” Simon explains. “The results I am trying to find would also have theoretical applications in the study of random graphs.”

Simon is about to start his third year studying maths and statistics: “My main interest is in pure maths, but I am also very interested in theoretical statistics, mainly in probability. I am intrigued by all things random.”

In fact, he dropped physics for statistics last year, “and I haven’t regretted it for one moment – sorry physics! I am mainly interested in probability but I have also enjoyed learning about data analysis and I have an interest in statistical computing.”

He adds, “Probability is such an interesting field, as it has a strong theoretical backing while also having many obvious applications such as games with dice and cards, as well as many less obvious applications, from financial-market analysis to quantum physics.”

Simon is hoping to become an academic: “I hope to continue into postgraduate study and then spend the rest of my life studying and teaching what I love.” When he’s not studying, Simon loves playing video games and roleplaying games like Dungeons and Dragons, as well as walking around the scenic spots of Auckland.

  • For general information  on University of Auckland summer scholarships, click here.
February 1, 2019

Meet Yiwen He, Statistics Summer Scholar

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Yiwen, below, is working with Professor Chris Wild on iNZight, the free data visualisation and analysis software he developed.

Yiwen is doing a conjoint BSc and BCom majoring in Statistics, Mathematics and Finance at the University of Auckland. She’s from China, and moved here seven years ago.

Yiwen is working on the Department of Statistics-based data analysis package iNZight.

This is a free, R-based environment started by statistics education expert Professor Chris Wild to help high-school students quickly and easily explore data and understand some statistical ideas.

However, iNZight has grown, and now extends to multivariable graphics, time series, and generalised linear modelling, including modelling of data from complex surveys. It is available in web and desktop versions.

As iNZight has expanded, it has needed tweaking and tidying, and Yiwen is working on how it copes with incoming data that has date and time fields telling us when something happened. “These data are most likely to be in non-standard form, meaning our computing software cannot recognise and get useful information from it,” she explains.

Yiwen has been working with the iNZight team to develop functions to convert raw dates and times data to a standard format that iNZight can recognise, and extract desired components from a dates-and-times variable. “If we are able to automate how dates and times are handled by our computing software, we can plot dates and times together with our observations.”

Yiwen is finding the work stimulating and fun, “since we get to do things that are more practical, and it is exciting to see how the functions you build actually work on various data sets. And since we are given plenty of time in the project, it really encourages you to explore what is out there and extend your knowledge to more advanced coding stuff.”

High-achieving students like are a critical part of the development of iNZight, says Chris Wild. “It’s a student-driven project, so most of the big-scale changes occur over the New Zealand summer period. At other times, we mostly work on small changes and bug fixes.”

+ For general information  on University of Auckland summer scholarships, click here.

 

January 31, 2019

Meet Statistics summer scholar Grace Namuhan

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Grace Namuhan, below, is working with Professional Teaching Fellow Anna Fergusson on the design of interactive data visualisation tools for large classes.

Stage one Statistics courses are enormously popular at the University of Auckland – there are more than 2,000 students per semester, and single lectures may contain up to 600 students. Anna Fergusson, who is part of the stage one teaching team, is a keen developer of in-class web apps to engage these students. For example, you might get students to respond to questions via their own devices, with the data collected to a Google sheet that can then be analysed in class. Working alongside Anna, Grace has been exploring the principles of designing such data visualisation interactives for large-scale learning.

In particular, she is working on an interactive to collect finer-grained data on how students carry out a hypothesis test – in particular, a Chi-square test for independence. This particular app is not for live analysis – rather, she is tracking every point, click, and selection students make as they work through the interactive.

She’s had to work out what data to collect and how to store it, and also develop a plan to analyse this very rich and complex data set – even this one app involves thousands and thousands of rows of data. She also has to consider what an educator would want to know from the data.

Grace, a third-year Bachelor of Science student undergraduate majoring in Data Science, says the project is exercising what she has learned so far, “which are my programming skills for creating the interactive and statistical skills for analysing the information extracted from the interactive”.

However, Grace didn’t start out her undergraduate studies in statistics – she did a year of biomedical science “but I didn’t really enjoy it. Data science just came out as a new major when I wanted to change my major – it involves half statistics courses and half computer science courses, so I thought it would be a really suitable major for me.”

Statistics appeals to Grace as she is “quite a practical person; turning what might look meaningless data into something useful is really fascinating. There are a lot of invisible data around us in our daily lives; being a data interpreter makes me feel like I am useful”.

  • For general information on University of Auckland summer scholarships, click here.
  • To find out more about Anna’s work in developing resources for large-class teaching, click here.
January 29, 2019

Meet summer scholar Monica Merchant

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Monica Merchant, right, is working with Professor Chris Wild on iNZight, the free data visualisation and analysis software he developed.

Monica, a BCom (Hons)/BSc student, is working on developing the predictive analytics module of the iNZight software, a toolbox that allows that allows users to build their own predictive model from a real-world dataset of their choice.

The module – whose interface is menu-driven and doesn’t require any knowledge of R, the environment in which iNZight is developed – guides the user through the model-building process, from data pre-processing and model training to tuning and validation.

Most importantly, says Monica, the module goes beyond traditional modelling methods by giving the user access to the full suite of machine learning algorithms available in R. Users can apply multiple algorithms to the data to explore differences in fit, predictive performance and generalisability.

This project is useful, says Monica, because it gives us another way to make sense of the data around us: “There is a lot of it and not all of it is created equal. We need ways to intelligently convert these large volumes of data into meaningful insights and actionable knowledge.”

She adds, “This is where machine learning comes in – the basic idea is to let the machine iteratively learn from the data to uncover underlying relationships and patterns or predict outcomes.”

Monica points out that while machine learning as a concept isn’t new – much of the theoretical groundwork behind many of its algorithms was laid in the mid-to-late 20th century – it has been only in recent years that advances in computational power have enabled us to make large-scale use of these algorithms in the real world.

Today, these algorithms are used everywhere, from bioinformatics and medical diagnosing to software engineering, financial markets, agriculture, astronomy and self-driving cars – but as Monica says, “this barely scratches the surface – check out Google Brain and DeepMind”.

Monica started her university career studying a Bachelor of Commerce in Finance, Accounting and Economics. Her Honours dissertation looked at the predictive power of option-implied risk-neutral densities, which sparked an interest in statistical computing. She added a BSc in Statistics.

Asked why statistics appeals, she says, “a degree in statistics is powerful since it offers a diverse and nearly limitless range of applications. I don’t have to limit myself to any one industry. Monica describes herself as inquisitive by nature, “so using data to solve real-world problems is always very rewarding”.

  • For general information on University of Auckland summer scholarships, click here.

 

 

 

January 23, 2019

Meet Statistics Summer Scholar Xin Qian

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Xin Qian, in the picture, is working with Dr Ben Stevenson, an expert in statistical methods for estimating animal populations.

How can you work out how many creatures inhabit a space when they are elusive, small and have lots of places to hide? Sitting in the bush for months and trying to count what you hear won’t be accurate – and it’s probably not a good use of time.

Another way is to estimate animal abundance is through acoustic surveys, which use microphone arrays to record animal chirps and calls; statistical techniques are then used to estimate the population. This is called spatial capture-recapture (SCR), and at present we have several ways of analysing the data.

That’s where summer student Xin Qian comes in. He is working with SCR expert Dr Ben Stevenson on a simulation project that compares two ways of analysing acoustic data. They are using statistics gathered from surveys of the rare moss frog, which exists only on South Africa’s Cape Peninsula.  

“We want to find out which is the best method for providing an accurate and stable estimation of frog density, factoring in the time each method takes,” says Xin. The existing method, he explains, requires that you go and collect independent data about how often individual frogs chirp in order to estimate animal density, which takes time.

However, the new method, developed by Ben Stevenson’s former MSc student Callum Young, promises to estimate both call rates and therefore animal density from the main survey alone. Says Ben: “This can save time, but may possibly leave you with a less accurate answer. What we are hoping to do is resolve the trade-off. How is the precision of our estimates affected if we switch to the new method? My guess is that it will be worse. Is this sacrifice worth the saving in fieldwork time?”

For this work they are using R, a programming language for statistical computing and graphics developed in the Department of Statistics in the mid-1990s and now used all over the world.

The project is ideal for Xin, a third-year University of Auckland BSc student majoring in Statistics and Information Systems. “It is always interesting to get information from data; it makes me feel like I am having some secret conversation with data that people can’t hear,” he says. “I normally won’t get bored dealing with numbers, and I prefer things having a logic or a reason behind them.”

Xin was born and raised in China, in the small east-coast city of Jiaxing near Shanghai. After finishing secondary school in China, he moved to New Zealand to pursue tertiary studies, starting his degree in 2016.

The University of Auckland appealed to him “because of its good reputation and ranking.” Although education rather than environment drew him to this country, he says that “New Zealand is a beautiful place with splendid natural views, and most people here are nice and welcoming; I have made lots of friends here. I have also became more outgoing and willing to try various outdoor activities that I wouldn’t get a chance to try if staying in my hometown.”  

  • For general information on University of Auckland summer scholarships, click here.