Posts from January 2019 (14)

January 31, 2019

It’s warm out there

We’re seeing a lot of international news stories about cold weather in the US, and here in NZ we’re also seeing a lot of stories about hot weather locally and in Australia. You might think from the news coverage that the northern hemisphere is currently colder than usual and the southern hemisphere is currently warmer than usual.

This map (from) shows ‘temperature anomaly’, that is, the difference between the temperature today and the 1979-2000 average for the time of year.

There are some cold spots on the map: the north-east of North America and parts of northern Russia are much colder than usual. There are also hot spots, in Alaska and in the Arctic Sea.   And as the summaries under the map show, the northern hemisphere is more unusually hot (on average) than the southern hemisphere.

Weather is what matters to us day to day: especially the weather around us and the weather in places with English-speaking television stations.  That can give a very misleading view of the state and trend of global climate.

 

Meet Statistics summer scholar Grace Namuhan

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Grace Namuhan, below, is working with Professional Teaching Fellow Anna Fergusson on the design of interactive data visualisation tools for large classes.

Stage one Statistics courses are enormously popular at the University of Auckland – there are more than 2,000 students per semester, and single lectures may contain up to 600 students. Anna Fergusson, who is part of the stage one teaching team, is a keen developer of in-class web apps to engage these students. For example, you might get students to respond to questions via their own devices, with the data collected to a Google sheet that can then be analysed in class. Working alongside Anna, Grace has been exploring the principles of designing such data visualisation interactives for large-scale learning.

In particular, she is working on an interactive to collect finer-grained data on how students carry out a hypothesis test – in particular, a Chi-square test for independence. This particular app is not for live analysis – rather, she is tracking every point, click, and selection students make as they work through the interactive.

She’s had to work out what data to collect and how to store it, and also develop a plan to analyse this very rich and complex data set – even this one app involves thousands and thousands of rows of data. She also has to consider what an educator would want to know from the data.

Grace, a third-year Bachelor of Science student undergraduate majoring in Data Science, says the project is exercising what she has learned so far, “which are my programming skills for creating the interactive and statistical skills for analysing the information extracted from the interactive”.

However, Grace didn’t start out her undergraduate studies in statistics – she did a year of biomedical science “but I didn’t really enjoy it. Data science just came out as a new major when I wanted to change my major – it involves half statistics courses and half computer science courses, so I thought it would be a really suitable major for me.”

Statistics appeals to Grace as she is “quite a practical person; turning what might look meaningless data into something useful is really fascinating. There are a lot of invisible data around us in our daily lives; being a data interpreter makes me feel like I am useful”.

  • For general information on University of Auckland summer scholarships, click here.
  • To find out more about Anna’s work in developing resources for large-class teaching, click here.
January 29, 2019

Pro14 Predictions for Round 12 Delayed Match

Team Ratings for Round 12 Delayed Match

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 12.65 9.80 2.90
Munster 9.86 8.08 1.80
Glasgow Warriors 7.30 8.55 -1.20
Scarlets 3.31 6.39 -3.10
Connacht 2.70 0.01 2.70
Ospreys 1.30 -0.86 2.20
Cardiff Blues 0.57 0.24 0.30
Edinburgh 0.17 -0.64 0.80
Ulster -0.70 2.07 -2.80
Cheetahs -1.53 -0.83 -0.70
Treviso -2.79 -5.19 2.40
Dragons -8.03 -8.59 0.60
Southern Kings -10.30 -7.91 -2.40
Zebre -13.96 -10.57 -3.40

 

Performance So Far

So far there have been 97 matches played, 76 of which were correctly predicted, a success rate of 78.4%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Glasgow Warriors vs. Ospreys Jan 26 9 – 3 11.40 TRUE
2 Leinster vs. Scarlets Jan 26 22 – 17 14.70 TRUE
3 Ulster vs. Treviso Jan 26 17 – 17 7.80 FALSE
4 Cheetahs vs. Zebre Jan 27 61 – 28 15.60 TRUE
5 Dragons vs. Munster Jan 27 7 – 8 -14.50 TRUE
6 Southern Kings vs. Edinburgh Jan 27 25 – 21 -6.90 FALSE
7 Cardiff Blues vs. Connacht Jan 27 8 – 7 2.60 TRUE

 

Predictions for Round 12 Delayed Match

Here are the predictions for Round 12 Delayed Match. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Cheetahs vs. Southern Kings Feb 03 Cheetahs 13.30

 

Meet summer scholar Monica Merchant

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Monica Merchant, right, is working with Professor Chris Wild on iNZight, the free data visualisation and analysis software he developed.

Monica, a BCom (Hons)/BSc student, is working on developing the predictive analytics module of the iNZight software, a toolbox that allows that allows users to build their own predictive model from a real-world dataset of their choice.

The module – whose interface is menu-driven and doesn’t require any knowledge of R, the environment in which iNZight is developed – guides the user through the model-building process, from data pre-processing and model training to tuning and validation.

Most importantly, says Monica, the module goes beyond traditional modelling methods by giving the user access to the full suite of machine learning algorithms available in R. Users can apply multiple algorithms to the data to explore differences in fit, predictive performance and generalisability.

This project is useful, says Monica, because it gives us another way to make sense of the data around us: “There is a lot of it and not all of it is created equal. We need ways to intelligently convert these large volumes of data into meaningful insights and actionable knowledge.”

She adds, “This is where machine learning comes in – the basic idea is to let the machine iteratively learn from the data to uncover underlying relationships and patterns or predict outcomes.”

Monica points out that while machine learning as a concept isn’t new – much of the theoretical groundwork behind many of its algorithms was laid in the mid-to-late 20th century – it has been only in recent years that advances in computational power have enabled us to make large-scale use of these algorithms in the real world.

Today, these algorithms are used everywhere, from bioinformatics and medical diagnosing to software engineering, financial markets, agriculture, astronomy and self-driving cars – but as Monica says, “this barely scratches the surface – check out Google Brain and DeepMind”.

Monica started her university career studying a Bachelor of Commerce in Finance, Accounting and Economics. Her Honours dissertation looked at the predictive power of option-implied risk-neutral densities, which sparked an interest in statistical computing. She added a BSc in Statistics.

Asked why statistics appeals, she says, “a degree in statistics is powerful since it offers a diverse and nearly limitless range of applications. I don’t have to limit myself to any one industry. Monica describes herself as inquisitive by nature, “so using data to solve real-world problems is always very rewarding”.

  • For general information on University of Auckland summer scholarships, click here.

 

 

 

January 24, 2019

Pro14 Predictions for Round 14

Team Ratings for Round 14

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 13.09 9.80 3.30
Munster 10.43 8.08 2.40
Glasgow Warriors 7.73 8.55 -0.80
Scarlets 2.87 6.39 -3.50
Connacht 2.57 0.01 2.60
Ospreys 0.87 -0.86 1.70
Cardiff Blues 0.70 0.24 0.50
Edinburgh 0.65 -0.64 1.30
Ulster -0.07 2.07 -2.10
Cheetahs -2.21 -0.83 -1.40
Treviso -3.42 -5.19 1.80
Dragons -8.59 -8.59 0.00
Southern Kings -10.78 -7.91 -2.90
Zebre -13.28 -10.57 -2.70

 

Performance So Far

So far there have been 90 matches played, 71 of which were correctly predicted, a success rate of 78.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Southern Kings vs. Cheetahs Jan 20 17 – 24 -3.50 TRUE

 

Predictions for Round 14

Here are the predictions for Round 14. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Glasgow Warriors vs. Ospreys Jan 26 Glasgow Warriors 11.40
2 Leinster vs. Scarlets Jan 26 Leinster 14.70
3 Ulster vs. Treviso Jan 26 Ulster 7.80
4 Cheetahs vs. Zebre Jan 27 Cheetahs 15.60
5 Dragons vs. Munster Jan 27 Munster -14.50
6 Southern Kings vs. Edinburgh Jan 27 Edinburgh -6.90
7 Cardiff Blues vs. Connacht Jan 27 Cardiff Blues 2.60

 

January 23, 2019

Meet Statistics Summer Scholar Xin Qian

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Xin Qian, in the picture, is working with Dr Ben Stevenson, an expert in statistical methods for estimating animal populations.

How can you work out how many creatures inhabit a space when they are elusive, small and have lots of places to hide? Sitting in the bush for months and trying to count what you hear won’t be accurate – and it’s probably not a good use of time.

Another way is to estimate animal abundance is through acoustic surveys, which use microphone arrays to record animal chirps and calls; statistical techniques are then used to estimate the population. This is called spatial capture-recapture (SCR), and at present we have several ways of analysing the data.

That’s where summer student Xin Qian comes in. He is working with SCR expert Dr Ben Stevenson on a simulation project that compares two ways of analysing acoustic data. They are using statistics gathered from surveys of the rare moss frog, which exists only on South Africa’s Cape Peninsula.  

“We want to find out which is the best method for providing an accurate and stable estimation of frog density, factoring in the time each method takes,” says Xin. The existing method, he explains, requires that you go and collect independent data about how often individual frogs chirp in order to estimate animal density, which takes time.

However, the new method, developed by Ben Stevenson’s former MSc student Callum Young, promises to estimate both call rates and therefore animal density from the main survey alone. Says Ben: “This can save time, but may possibly leave you with a less accurate answer. What we are hoping to do is resolve the trade-off. How is the precision of our estimates affected if we switch to the new method? My guess is that it will be worse. Is this sacrifice worth the saving in fieldwork time?”

For this work they are using R, a programming language for statistical computing and graphics developed in the Department of Statistics in the mid-1990s and now used all over the world.

The project is ideal for Xin, a third-year University of Auckland BSc student majoring in Statistics and Information Systems. “It is always interesting to get information from data; it makes me feel like I am having some secret conversation with data that people can’t hear,” he says. “I normally won’t get bored dealing with numbers, and I prefer things having a logic or a reason behind them.”

Xin was born and raised in China, in the small east-coast city of Jiaxing near Shanghai. After finishing secondary school in China, he moved to New Zealand to pursue tertiary studies, starting his degree in 2016.

The University of Auckland appealed to him “because of its good reputation and ranking.” Although education rather than environment drew him to this country, he says that “New Zealand is a beautiful place with splendid natural views, and most people here are nice and welcoming; I have made lots of friends here. I have also became more outgoing and willing to try various outdoor activities that I wouldn’t get a chance to try if staying in my hometown.”  

  • For general information on University of Auckland summer scholarships, click here.

 

January 22, 2019

Meet Lushi Cai, Statistics summer scholar

Every summer, the Department of Statistics offers scholarships to high-achieving students so they can work with staff on real-world projects. Lushi Cai is working with Professor Chris Wild on iNZight, a free data visualisation and analysis software he developed.

There’s a Chinese saying that goes “Travel ten thousand miles, read ten thousand books.” And that’s just what summer scholar Lushi Cai, in the picture, is doing.

Originally from Guangzhou in southern China, Lushi had done a year of undergraduate study in China before she moved with her family to New Zealand three years ago. She embarked on a Bachelor of Science majoring in computer science and statistics at the University of Auckland, finishing her degree last year. This year, Lushi will be on an honours programme.

As a summer scholar, Lushi is working on the Department of Statistics’ data analysis package, iNZight. This is a free, R-based environment started by statistics education expert Professor Chris Wild to help high-school students quickly and easily explore data and understand some statistical ideas. However, iNZight has grown, and now extends to multivariable graphics, time series, and generalised linear modelling, including modelling of data from complex surveys. It is available in web and desktop versions.

Lushi’s summer scholarship involves implementing interactive web graphs for R- generated statistics plots and enhancing the web version of iNZight by adding an interactive plot function.  “Users tell iNZight what to do and what analysis output they want using iNZight’s gui (graphical user interface),” she explains. “They don’t need to know how to write code.

“However, key modules also provide users with the R code that iNZight used to produce their output. This is great for learning how to do things in R, and it also makes iNZight analyses reproducible by others.”

But improvements are needed, she adds: “Unfortunately, the R code automatically generated by iNZight is not easy for humans to read. So I’m writing an auto-formatter that converts messy R code into tidy R code that’s easy to read.”

Students are a critical part of the development of iNZight, says Chris Wild. “It’s a student-driven project, so most of the big-scale changes occur over the New Zealand summer period. At other times, we mostly work on small changes and bug fixes.”

Lushi enjoys problem-solving, so this sort of project is a natural fit. In addition, “my interest is analysing huge data and producing a direct way, such as tables and graphs, to explore the features. I believe this is a powerful skill and can be applied to every field in the real world”.

  • For general information on University of Auckland summer scholarships, click here.

 

January 15, 2019

eScooter costs

There’s a story at Stuff saying the ACC have paid out more than $200,000 across 655 e-scooter related injuries.  If you don’t regularly work with ACC data it’s hard to get a feel for whether that’s a lot or not.

Two comparisons I saw on Twitter, and a bonus one

  • From Stuff: in the 2016/7 year, “[s]ome 3517 horse-injury claims added up to $6,867,869”,  a decrease from previous years
  • From the Herald: “The number of injuries involving avocados has increased over the past three years, with the nutritious fruit costing ACC just over $800,000.”
  • From the Herald: “Last year more than 4000 New Zealanders were injured on Christmas Day alone, accounting for $3,628,574 worth of ACC claims.”

First we need to look at time frames: the e-scooter data are over three months; the horse and Christmas data over a year; the avocados over three years. Annualised, we’d have $0.6 million/year for scooters, $3.6 million/year for Christmas, $6.9 million/year for horses, and $0.26 million/year for avocados.   Avocados aren’t in the running, but horses are looking strong.

There are a lot of horses in New Zealand, though. Apparently, over 100,000! They won’t each be ridden with the same frequency as the average Lime e-scooter, but it wouldn’t take that high a usage rate for scooters to be more dangerous than horses.  What we can see, though, is that horse injuries are more severe on average: the headline statistic gave five times as many injuries from horses and over 30 times the cost.   (Avocados look even safer if you count number of injuries rather than cost).

Since e-scooters are new and only cost a few dollars to try, there will be a lot of inexperienced users right now.  You’d assume that over time the typical user will become more experienced and probably more risk-averse, and so the risks should go down a bit. Also, there will probably be an increasing number of people who have their own scooter and are a bit more careful with it.

The obvious comparison, though, isn’t horses or avocados or Christmas: it’s cars.  ACC paid out $264 million for driving-related injuries in 2017-18. Spread out over 3 million cars or 4 million motor vehicles that’s still less than half the cost per vehicle that we’re seeing for e-scooters (assuming most e-scooter use is the Lime rentals). However, the ACC figure doesn’t attempt to count the cost of 378 road deaths last year.  At the Ministry of Transport’s cost-of-life valuation of just over $4 million, that’s another $1.5 billion.

Cars, um, win?

 

 

January 14, 2019

Briefly

  • Data definition drift: the impact of interventions to reduce hospital readmissions has been overestimated because of changes in how admissions are coded.  When systems change to allow more than 10 diagnoses to be entered, more than 10 are entered for a lot of people (Twitter thread)
  • Public transport data visualisation, from Twitter.  Sara Weber says (in translation) “My mother is a commuter in the Munich area. And avid knitter. She knitted a “rail delay scarf” in 2018. Two rows per day: Grey at under 5 minutes, pink at 5 to 30 minutes delay, red if delayed on both trips or once over 30 minutes.”
  • Pew Research, whose fault “millennial” is, remind us that they define millennials as born 1981-1996. The youngest millennials are now 22.  (via @drob)
  • Some years ago, there were stories about a young woman who was outed as pregnant by Target’s targeted advertising before she’d decided to tell people.  There’s a new ad for a firm called Zulily that is pushing this as a good thing. (via Amie Stepanovich)
  • There’s a story on Stuff about lead in eggs from backyard chickens.  The story ends with a quote from someone who keeps chickens “you’re breathing in more lead living next to a busy road more than the chickens going to lay in its lifetime.”. That might have been true forty years ago, but it isn’t true now — getting rid of unleaded petrol has resulted in lead air pollution from traffic largely going away.   Lead is the big success story of pollution reduction. Here’s a graph of average lead concentrations in the air across US air pollution monitoring stations (the trend would be similar in NZ)

Reeferendum polling

The Herald reports 60 per cent support for legal cannabis – new poll. There’s going to be a lot of this over the next couple of years, so here are some points to consider

  1. As the Herald says, the poll found a substantially higher number of daily cannabis users than other research: about three times higher than the NZ Drug Use Survey from the Ministry of Health and four or five times higher than a 2010 survey sponsored by NORML.  This has got to reduce our confidence in the results: either because it indicates the sample is unrepresentative or because it indicates that surveys on drugs are intrinsically unreliable.
  2. We don’t know what the question on the referendum will be, so the survey obviously wasn’t asking that question. I hope the actual question will be a choice between a specific proposed set of legislation and the status quo, though the Government will have to move quickly to get the legislation drafted, released for public comment, and revised in time. In any case, you’d expect (as with Brexit) more support for a generic ‘change’ proposal as in this survey than for any concrete and specific proposal.  Some people will support private growing but oppose commercialisation; others will argue that you can only get rid of the illegal market if the legal market is fairly open.  And so on.
  3. The poll results were weighted to agree demographically with the 2013 Census population.  That’s a standard thing to do with surveys, but in this case it would be more useful to weight them to look like the 2017 voting population.  The age groups who support legalisation more strongly are also historically less likely to vote.