Posts filed under Just look it up (283)

March 28, 2018

Cycling for work or play

Auckland Transport publish data from cycle counters on various bike paths. They’re most interested in trends over time (increasing) and perhaps in seasonal variation (more in summer).

Here’s a look at weekday vs weekend counts using data from the start of 2016 to now (click to embiggen).

There are some paths that are clearly used primarily by commuters, with more than twice the average traffic on a weekday vs weekend. There are also some that are mostly used at the weekend, such as Matakana, Upper Harbour, and Mangere Bridge.  And some, like the Lightpath, that get used all the time.

Note: while it’s great that Auckland Transport publishes these data, the data would be easier to reuse if the names they used for each counter were consistent over time (eg: “Tamaki Dr” vs “Tamaki Drive”, or “Nelson Street Lightpath Counter Cyclists” vs “Nelson Street Lightpath Cyclists”)

 

March 26, 2018

The data speak for themselves?

This graph was on Twitter this morning. There’s nothing wrong with the graph: good data, clear presentation, but it does provide a nice illustration of the difficulties in official statistics — you have to decide what categories to use, and it makes a difference.

The second leading cause, motor vehicles, is straightforward enough.  The first, firearms, is more complicated. A majority of the firearm deaths are suicides, and it’s controversial whether firearm access increases the suicide rate or just affects the method.  Poisoning is also complicated: you might well want to treat both suicide and accidental recreational-drug overdose separately. And so on.

Sometimes you want to break down the data by intent, sometimes by physical cause, sometimes by medical type of injury or damage. You can’t define the ‘correct’ answer in the absence of a question.

February 17, 2018

Read me first?

There’s a viral story that viral stories are shared by people who don’t actually read them. I saw it again today in a tweet from Newseum Insititute

If you search for the study it doesn’t take long to start suspecting that the majority of news sources sharing this study didn’t read it first.  One that at least links is from the Independent, in June 2016.

The research paper is here. The money quote looks like this, from section 3.3

First, 59% of the shared URLs are never clicked or, as we call them, silent.

We can expand this quotation slightly

First, 59% of the shared URLs are never clicked or, as we call them, silent. Note that we merged URLs pointing to the same article, so out of 10 articles mentioned on Twitter, 6 typically on niche topics are never clicked

That’s starting to sound a bit different. And more complicated.

What the researchers did was to look at bit.ly URLs to news stories from five major sources, and see if they had ever been clicked. They divided the links into two groups: primary URLs tweeted by the media source itself (eg @NYTimes), and secondary URLs tweeted by anyone else. The primary URLs were always clicked at least once — you’d expect that just for checking purposes.  The secondary URLs, as you’d expect, averaged fewer clicks per tweet; 59% were not clicked at all.

That’s being interpreted as if it were 59% of retweets didn’t involve any clicks. But it isn’t. It’s quite likely that most of these links were never retweeted.  And there’s nothing in the data about whether the person who first tweeted the link read the story: there certainly isn’t any suggestion that person didn’t read the story.

So, if I read some annoying story about near-Earth asteroids on the Herald and if tweeted a bit.ly URL, there’s a chance no-one would click on it. And, looking at my Twitter analytics, I can see that does sometimes happen. When it happens, people usually don’t retweet the link either, and it definitely doesn’t go viral.

If I retweeted the official @NZHerald link about the story, then it would almost certainly have been clicked by someone. The research would say nothing whatsoever about the chance that I (or any of the other retweeters) had read it.

 

February 13, 2018

Opinions about immigrants

Ipsos MORI do a nice set of surveys about public misperceptions: ask a sample of people for their estimate of a number and compare it to the actual value.

The newest set includes a question about the proportion of the prison population than are immigrants. Here’s (a redrawing of) their graph, with NZ in all black.

People think more than a quarter of NZ prisoners are immigrants; it’s actually less than 2%. I actually prefer this as a ratio

The ratio would be better on a logarithmic scale, but I don’t feel like doing that today since it doesn’t affect the main point of this pointpost.

A couple of years ago, though, the question was about what proportion of the overall population were immigrants. That time people also overestimated a lot.  We can ask how much of the overestimation for the prison question can be explained by people just thinking there are more immigrants than there really are.

Here’s the ratio of the estimated proportion of immigrants among the prison population and the total population

The bar for New Zealand is to the left; New Zealand recognises that immigrants are less likely to be in prison than people born here. Well, the surveys taken two years apart are consistent with us recognising that, at least.

That’s just a ratio of two estimates. We can also compare to the reality. If we divide this ratio by the true ratio we find out how much more likely people think an individual immigrant is to end up in prison compared to how likely they really are.

It seems strange that NZ is suddenly at the top. What’s going on?

New Zealand has a lot of immigrants, and we only overestimate the actual number by about a half (we said 37%; it was 25% in 2017). But we overestimate the proportion among prisoners by a lot. That is, we get this year’s survey question badly wrong, but without even the excuse of being seriously deluded about how many immigrants there are.

January 8, 2018

Long tail of baby names

The Dept of Internal Affairs has released the most common baby names of 2017 (NZ is, I think, the first country each year to do this), and Radio NZ has a story.  A lot of names popular last year were also popular in the past; a few (eg Arlo) are changing fast.

If you look at the sixty-odd years of data available, there’s a dramatic trend. In 1954, ‘John’ was the top boy’s name, with 1389 uses. In 2017 the top was ‘Oliver’, but with only 314 uses — not enough to make 1954’s top twenty. According to the government, there were nearly 13,000 different names given last year, so the mean number of babies per name is under 5; the most popular names are still much more popular than average. But less so than in the past.

Here’s the trend in the number of babies given the top name

and the top ten names

and the top hundred names

That decrease is despite an increase in the total population: here’s the top 10 names as a percentage of all babies (assuming 53% of babies are boys)

and the top 100 names

The proportion with any of the top 100 names has been going down consistently, and also becoming less different between boys and girls.

 

November 19, 2017

Hyperbole or innumeracy?

From the Herald (and also from NewstalkZB, apparently originally at South Africa’s The Citizen)

He is also said to own a custom-built Mercedes Benz s600L that is able to withstand AK-47 bullets, landmines and grenades. It also features a CD and DVD player, internet access and anti-bugging devices. The Citizen reported that Mugabe – who is a trained teacher – also owns a Rolls-Royce Phantom IV: a colonial-era British luxury car so exclusive, only 18 were ever manufactured. The vintage black car is estimated to be worth more than Zimbabwe’s entire GDP. (emphasis added)

Several people on Twitter, starting with Richard Easther, had the same reaction: that this doesn’t look remotely plausible.  It’s like the claims that Labour’s water levies would make cabbages cost $18 and a bottle of wine $75 — extraordinary claims demand, if not extraordinary evidence, at least some evidence.

So, how is it that you’d decide this number was implausible? Well, in one direction, you might try to guess the GDP of Zimbwawe.  If Zimbabwe had a smaller population than NZ you’d probably know it was a small country, so we can say there’s at least 5 million people.  So, if the per-capita GDP was only $1, it would still add up to $5 million, and that’s a very expensive car.  Since you’d expect the population to be more than 5 million and the per-capita GDP to be a lot more than $1, the figure is looking implausible.

In the other direction, you might look up the current GDP of Zimbabwe — $16 billion — or the lowest it’s been in recent years — $4.4 billion in 2008 — and note that you could by several wide-body jets for that much.

That’s enough to know something is strange. If you wanted more detail you could search for prices of Rolls-Royce Phantom IVs or of the most expensive cars ever sold, and find that, yes, there’s three or four orders of magnitude missing.

Or, you could look at the first line of the story

Zimbabwe embattled president Robert Mugabe is reportedly worth more than $1 billion despite his country being one of the poorest in the world.

Or the last line

Rolls Royce Phantoms cost a minimum of just under $698,000, but custom-built versions are sold for as much as $1.74 million. Media in South Africa reported the combined cost of the cars was about $6.98 million.

and again, there’s no way the claim about the car vs the GDP could be true — a used one couldn’t be worth thousands of times more than a new one.

So, where could it have come from?  My guess is that the claim was originally hyperbole: that someone did say “his car’s worth more than the Zimbabwe GDP” but they didn’t mean it literally. Over repetitions, the rhetorical figure turned into an “estimate”, and was quoted without any real thought.

What’s harder to understand is someone thinking a CD and DVD player is the height of motoring luxury.

October 10, 2017

Graphic of the week

From the world’s third-largest news agency:

afp

  1. The Nationalist Party?
  2. National got 56 seats, not 58 — the graph seems to have the National results from the provisional count but the Labour and Green results from the final count
  3. NZ First doesn’t use yellow
  4. ACT, on the other hand, does.
  5. But ACT is relatively unlikely to enter a left-wing coalition with Labour and the Greens
August 11, 2017

Different sorts of graphs

This bar chart from Figure.NZ was in Stuff today, with the lead

Working-age people receiving benefits are mostly in the prime of our working life – the ages of 25 to 54.

19205831

The numbers are correct, but the extent to which the graph fits the story is a bit misleading.  The main reason the two bars in the middle are higher is that they are 15-year age groups, when the first bar is a 7-year group and the last is a ten-year group.

Another way to show the data is to scale the bar widths proportional to the number of years and then scale the height so that the bar area matches the count of people. The bar height is now counts of people per year of age

benefits

This is harder to read for people who aren’t used to it, but arguably more informative. It suggests the 25-54 year groups may be the largest just because the groups are wider.

We really need population size data, since the number of people in NZ also varies by age group.  Showing the percentage receiving benefits in each age group gives a different picture again

benpop

It looks as though

  • “working age” people 25-39 and 40-54 make up a larger fraction of those receiving benefits than people 18-24 or 55-64
  • a person receiving benefits is more likely to be, say, 20 or 60 than 35 or 45.
  • the proportion of people receiving benefits increases with age

These can all be true; they’re subtly different questions. Part of the job of a statistician is to help you think about which one you wanted to ask.

August 1, 2017

Holiday travel trends

The Herald has a story and video graphic, and a nice interactive graphic on international travel by Kiwis since 1979.  The story is basically good (and even quotes a price corrected for inflation).

Here’s one frame of the video graphic
escape

First, a lot of the world isn’t coloured. There are New Zealanders who have visited say, Germany or Turkey or Egypt, even though these countries never make it into the 1-24,999 colour category. It looks as if the video picks a set of 16 countries and follows just those forward in time: we’re not told how these were picked.

Second, there’s the usual map problem of big things looking big (exacerbated by the Mercator projection). In 1999, more people went to Fiji than the US; more to Samoa than France. A map isn’t good at making these differences visually obvious, though the animation helps. And, tangentially, if you’re going to use almost a third of the map real estate on the region north of 60°, you should notice that Alaska is part of the USA.

The other, more important, issue that’s common to the whole presentation (and which I understand is being updated at the moment) is what the country data actually mean. It seems that it really is holiday data, excluding both business and visiting friends/relatives (comparing the video to this from Figure.NZ), but it’s by “country of main destination”.  If you go to more than one country, only one is counted.  That’s why the interactive shows zero Kiwis travelling to the Vatican City, and it may help explain numbers like 300 for Belgium.

Official statistics usually measure something fairly precise, but it’s not always the thing that you want them to measure.

May 22, 2017

How rich do you feel

From Scott Macleod, in a Stat of the Week nomination

The NZ Herald claims that a person earning the median NZ salary of USD $33,500 (equivalent) is the 55 millionth richest person in the world by income.

However, this must be wrong.

There are 300 million people in the USA alone, and their median income is higher than ours. This means that the average New Zealander wouldn’t even be the 55 millionth richest person in the USA, let alone the world.

Basically, yes, but it’s not quite as simple as that.  That median NZ salary looks like what you get if you multiply the NZ median “weekly personal income from salary and wages among those receiving salary and wages” (eg here) by 52, which would be appropriate for people receiving salary or wage income 52 weeks per year. The median personal income for NZ will be quite a lot lower, and the median personal income for the US is also lower: about USD30,240.

Even so, there are about 250 million adults (by the definition used) in the US, and nearly half of them have higher personal income than USD33500, so that still comes to over 100 million people. And that’s without counting Germany or the UK — or cities such as  Beijing and Shanghai that have more people with incomes that high than New Zealand does.  And that’s also assuming the web page doesn’t do currency conversions — which it looks from the code as if it’s trying to.

The CARE calculator must indeed be wrong, or using an unusual definition of income, or something. Unfortunately, the code for how it does the calculation is hidden; they say “After calculating the distribution of income, we then use a statistical model to estimate your rank.” 

As a cross-check, Pew Global also has a web page based on World Bank data.  It doesn’t let you put in your own cutpoints, but it says 7% of the world’s population had more than $50/day to live on in 2011.  The CARE web page thinks it’s more like 4.7% now.  The agreement does seem to be better at lower incomes, too — the estimates will be more accurate for people who aren’t going to use the calculator than for people who are.