Posts written by Thomas Lumley (2534)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

November 18, 2022

How many Covid cases?

From Hannah Martin at Stuff: Only 35% of Covid cases being reported, ministry says, after earlier saying it was 75%

The ministry’s latest Trends and Insights report, released on Monday, said “approximately three quarters of infections are being reported as cases”, based on wastewater testing.

However, it has since said that “based on updated wastewater methodology”, about 35% of infections were reported as cases as of the week to November 2.

This is a straightforward loss during communication:  the 75% was an estimate of how much the reporting had changed since the first Omicron peak, but it got into the Trends and Insights report as an absolute rate.  Dion O’Neale is quoted further down the story explaining this.

For future reference, it’s worth looking at what we can and can’t estimate well from various sources of information we might have.

The wastewater data has the advantage of including everyone in a set of cities and towns, adding up to the majority of the country; everybody poops. It has the disadvantage of not directly measuring cases or infections.  The wastewater data tells us how many Covid viral fragments are in the wastewater.  How that relates to infections depends on how many viral fragments each person sheds and how many of these make it intact to the collection point.  This isn’t known — it’s probably different for different people, and might depend on vaccination and previous infections and which variant you have and age and who knows what else. However, the population average probably changes slowly over time, so if the number of viral fragments is going up this week, the number of active infections is probably going up, and  if the number of viral fragments is going down, the number of active infections is probably going down.

Using the wastewater data, we can see that the ratio of reported cases to wastewater viral fragments has been going down slowly since the first Omicron peak.  We’ve got a lot of other reasons to think testing and reporting is going down, so that’s a good explanation. It’s especially a good explanation because most of the other reasons for a change (eg, less viral shedding in second infections) would make the ratio go up instead.  So, with the ratio of reported cases to viral fragments going down by 25% it makes sense to estimate that the ratio of reported cases to infections has gone down 25%.

Now all we need is to know what the reporting rate was at the peak. Which we don’t know. It couldn’t have been much higher than 60%, because some infections won’t have been symptomatic and some tests will have been false negatives.  If it was 60%, it’s down to roughly 40% now.  If it was lower than that at the peak, it’s lower than 40% now.  You could likely get somewhat better guesses by combining the epidemic models and the wastewater data, but it’s always going to be difficult.

You might think that hospitalisation and death data are less subject to under-reporting. This is true, but the proportion of infections leading to hospitalisation is (happily) going down due to vaccination and prior infection, and the proportion leading to death is (happily) going down even more due to better treatment.  On top of those changes, hospitalisation and death lag infection by quite a long time. The hospitalisation rate and death rate are directly important for policy, but they aren’t good indicators of current infections.

So, we’re a bit stuck. We can detect increases or decreases in infections fairly reliably with wastewater data, but absolute numbers are hard.  This is even more true for other diseases — in the future, there will hopefully be wastewater monitoring for influenza and maybe RSV, where we expect the case reporting rate to be massively lower than it is for Covid.

To get good absolute numbers we need a measurement of the actual infection rate in a random sample of people. That’s planned — originally for July 2022, but the timetable keeps slipping. A prevalence survey is a valuable complement to the wastewater data; it gives absolute numbers that can be used to calibrate the more precise and geographically detailed relative numbers from the wastewater.  Until we have a prevalence survey, the ESR dashboard is a good way to get a feeling for whether Covid infections are going up or down, and how fast.

November 16, 2022

Is Roy Morgan weird yet?

Some years ago, at the behest of Kiwi Nerd Twitter, I looked at whether the Roy Morgan poll results varied more than those from other organisations, and concluded that they didn’t. It was just that Roy Morgan published polls more often. They had a larger number of surprising results because they had a larger number of results.  Kiwi Nerd Twitter has come back, asking for a repeat.

I’m going to do analyses of two ways of measuring weirdness, for the major and semi-major parties. All the data comes from Wikipedia’s “Opinion polling for the next NZ Election“, so it runs from the last election to now.  First, I’ll look at National.

The first analysis is to look at departures from the general trend.  The general trend for National (from a spline smoother, fitted in R’s mgcv package, in a model that also has organisation effects) looks like this:

Support was low; it went up.

I subtracted off the trend, and scaled the departures by the margin of error (not the maximum margin of error). Here they are, split up by polling organisation

The other analysis I did was to look at poll-to-poll changes, without any modelling of trend. The units for these are just percentage points.

Next, the same things for Green Party support: departures from their overall trend

And poll-to-poll differences

For ACT:

And finally for Labour

 

So, it’s complicated. The differences are mostly not huge, but for the Greens and Labour there does seem to be more variability in the Roy Morgan results. For National there isn’t, and probably not for ACT.  The Curia polls are also more variable for Green but not for Labour.  I think this makes Roy Morgan less weird than people usually say, but there does seem to be something there.

As an additional note, the trend models also confirm that the variance of poll results is about twice what you’d expect from a simple sampling model. This means the margin of error will be about 1.4 times what the pollers traditionally claim: about 4.5% near 50% and about 1% near the MMP threshold of 5%

November 5, 2022

Winston First?

An ongoing theme of StatsChat is that single political polls aren’t a great source of information, and that you need to combine them. A case in point: this piece at Stuff describing a new Horizon poll.  The headline is Winston Peters returns to kingmaker position in new political poll, and the poll has NZ First on 6.75%.  My second-favourite NZ poll aggregator, Wikipedia, shows other recent polls, where the public results from Curia, Roy Morgan, and Kantar were 2.1%, 1%, and 3% and a leaked result from Talbot Mills was 4%.  It’s possible that this shows a real and massive jump over the past couple of weeks. Stranger things do happen in politics — but not much stranger and not all that often. It’s quite likely that it’s just some sort of blip and doesn’t mean much.

Stuff does add “The poll had a margin of error of 3.2%, meaning NZ First’s crossing the 5% threshold was within the margin of error,”  but that’s the wrong caveat.   The 3.2% margin of error is more strictly called the ‘maximum margin of error’, because it’s the margin of error for proportions near 50%, which is larger than at, say, 5%.  I’ve written before about calculating the corresponding margin of error for minor parties.

In this case, under the pure mathematical sampling approximations used to get 3.2%, a 95% uncertainty interval for NZ First’s true support would go from 5.2% to 8.5%. If we only worried about sampling error, NZ First would be fairly clearly above the 5% threshold.  The problem is that the mathematical sampling error  is typically an underestimate of total survey error — and when you get a very surprising result, it’s sensible to consider that you might possibly be out on the fringes of the total survey error.  Or not. We will find out soon.

 

 

 

 

 

October 20, 2022

Bus cancellations

The friendly StatsChat busbots have been tracking cancellations as well as delays: for the past month in Auckland and longer in Wellington.  Here’s a summary of the Auckland cancellations

They seem to be up again, which might not be entirely unconnected with the current increase in Covid cases. Perhaps more to the point, that’s a lot of missing buses.

September 19, 2022

After the Great Resignation

This story is a month old now, but Stuff served it up to me again, and I didn’t write about it earlier. The headline is Workers are discovering the ‘Great Regret’ of quitting their jobs, and the key data-based line is

In the United States, a survey of more than 15,000 workers by job-search platform Joblist found 26 percent of workers who quit during the Great Resignation regret their decision.

I’ve edited that quote so that the link works, which it doesn’t on Stuff or newsroom. If you follow the link, you find that is basically what Joblist says in its writeup of the survey.  If you are naturally suspicious or nerdy enough to read the Methodology section at the bottom of the page, though, it looks a bit different

We surveyed 628 job seekers who quit their previous job about why they quit and whether they regret their decision.

So, the 26% is of 628 people, not more than 15,000.  More specifically, it’s 628 job-seekers. The target population doesn’t appear to include people who already had a new job or who weren’t looking for one — two groups that you’d expect to have fewer regrets.

The Methodology section doesn’t say how they chose the people to survey, though it does say This data has not been weighted, and it comes with some limitations. At best, that suggests a survey with no attempt to compensate for non-response. At worst, it could be a bogus self-selected straw poll.

September 17, 2022

Briefly

  • From Radio NZ’s series on the lotto: “When contacted by RNZ, Lotto said it would now no longer claim that there were lucky stores. “We recently carried out a piece of work looking at the use of the word ‘luck’ in relation to our products and as a result of this work have decided we will not in the future put ‘lucky stores’ or ‘lucky regions’ in press releases,” head of corporate communications Lucy Fullarton said.”.  As StatsChat readers will remember, we’ve been attacking this use of “lucky” for a while.
  • On the other hand, Lotto doesn’t need to specifically make these claims any more, since they’re already  well known. For example, see today’s Herald, “Kiwis are rushing into lucky Lotto stores ahead of tonight’s $20 million jackpot draw”, naming the same Hastings pharmacy as the Radio NZ story did.
  • Good article by Jamie Morton in the Herald on interesting clinical trials in New Zealand.
  • The UK Office of National Statistics has a life expectancy calculator — if, to pick an example almost at random, you wanted to find out how long a 73-year old man would be expected to live
  • There’s a claim out there that the median book only sells twelve copies.  As you’d expect, it’s more complicated than that
September 12, 2022

New euphemism for ‘bogus poll’

Stuff has a headline Tory Whanau clear leader in straw poll for Wellington mayoralty.

This is bad. Election polling is valuable because it gives people some feeling for what the rest of the electorate thinks, solving the Pauline Kael problem

 “I live in a rather special world. I only know one person who voted for Nixon. Where they are I don’t know. They’re outside my ken. But sometimes when I’m in a theater I can feel them.” — Pauline Kael, NYT film critic

Polling also gives newspapers something to write about that’s novel and at least a bit informative.

With good polling, people have an accurate idea of public opinion; with poor polling they have an inaccurate idea. With self-selected bogus polling they have no idea at all.  The Dominion Post even tweeted about this story

The poll, while unashamedly unscientific, points to Wellington’s next mayor being relative unknown Tory Whanau less than a week from voting papers going out.

The poll is incapable of pointing to anything, so it doesn’t point to Tory Whanau being the next mayor, however desirable that might be.

Back in the early days of StatsChat, when John Key debated David Cunliffe, we showed the results of three useless self-selected bogus polls about who had done better: Newstalk ZB was 63% in favour of Cunliffe; TVNZ was 61% in favour of Key; the Herald was a tie.

If a poll like this gets the right answer, it’s basically an accident. There’s no excuse for headlining the results as if they meant something.

September 2, 2022

Showing MPs expenses

MPs expenses paid by the Parliamentary Service were published recently for the quarter ending June 30. Here are the averages by party (click to embiggen)

This is a terrible presentation of the data, though not the worst one I saw.  Simple bar charts like this are sometimes useful, but can be very misleading. Here’s a dotplot, showing the individual data points

In this version, you should immediately notice something. There are a lot of Labour MPs who aren’t getting any expenses! This might then prompt  you to go back to the website and read the “Understanding members’ expenses” page, and find out that these numbers don’t include Ministers’ expenses.  Most of the ministers are in the Labour Party (for some reason) and that will pull down the average.

You might then remove the ministers and compare averages again, but it’s not obvious that’s a good comparison. The outlying blue dot at the top there is Chris Luxon. It seems reasonable that the leader of the opposition could have more expenses than a random MP.  This then raises the question of whether comparing averages for non-Minister MPs between the government and opposition is sensible in general.  There are other questions: for example,  travel expenses will depend on where you live as well as where you go.  While we’re looking at outliers, I should also note the outlying blue dot at the bottom. That’s Sam Uffindell. He was elected in a by-election on June 18, so he didn’t have much time to rack up expenses.

August 25, 2022

Less creepy than it sounds

News.com.au has the headline New report from Westpac reveals ‘turbocharged’ spending and increased dependence on antidepressant drug, and follows it up with A new report looking at Australia’s spending on health services shows one drug entered the ‘Top 10’ list of prescription medicines for the first time in 14 years.”  The article then goes on about using Westpac card data and a survey of pharmacies. I saw this on Twitter, where someone was outraged that Westpac was using individual drug purchase data this way. They’re not.

There’s literally nothing in the article, or the press release it’s based on, or the report that’s based on to support “increased dependence on antidepressant drug”, and not much to support increased use. The information on sertraline prescriptions didn’t come from the Westpac survey. The news about sertraline entering the top ten prescribed medicines is from the Pharmaceutical Benefit Scheme (Australia’s version of Pharmac), who (like Pharmac) release this sort of data every year. In the year 2019-2020, sertraline was one of the top ten; it hadn’t been in previous years. This isn’t a new report; it came out in late 2020 and we already have the data for 2020-21.  Moving into the top ten doesn’t necessarily imply higher use, and we can’t tell from these data whether the prescriptions for antidepressants in general increased or whether the relative use of different antidepressants changed.

The actual Westpac survey information is probably interesting if you run a pharmacy in Australia, but it doesn’t have much headline potential.

 

August 18, 2022

Pay gap confusion

From 1News on Twitter — with the text “Do these stats make you mad? Pay attention – for every $100 a man earns in NZ, a woman doing the same job earns $8 less.”

As Graeme Edgeler says, the linked article does not substantiate the claim in the text of the tweet at all. The $8 number isn’t referenced in the article and the 9.2% number doesn’t measure earnings differences for two people doing the same job.  It’s also worth considering a second article about the new income statistics linked from the first article. It gives yet another pair of income numbers for men and women, one where the gap is 25% and is slightly smaller than last year.  On top of that,  the 10% and 9.2% claims in the image seem to be about two different income measures, though it’s not really clear where the 10% is sourced.  I’ll also link the new StatsNZ figures, which 1News doesn’t bother to do.

Let’s start with the gender pay gap.  Stats New Zealand’s indicator for gender pay differences is the percentage difference in median hourly earnings.  They have a nice document from 2015 explaining why they define it this way. It’s not a perfect summary — it’s not going to be; it’s a single number — but it’s what they use.  A gender pay gap of 9.2% means that if median hourly earnings for men were $100 (sadly, no) they would be $90.80 for women.  That’s not $8 less; it’s $9.20 less.  The $8 figure doesn’t seem to come from anything in the article, and it’s an understatement of the gap, which is a bit unusual.

Not only is the figure of $8 apparently wrong, the interpretation is wrong.  The gender pay gap isn’t about comparing income for  the same job. That’s good, because a significant part of gender discrimination in pay comes from women’s jobs being paid less.  Looking at the exact same job would understate the pay gap, and would get you into endless arguments over, eg, whether “captain of the national rugby union team” was or was not the same job for Sam Cane and Ruahei Demant.  The interpretation is also wrong because another reason women earn less money than men is they tend to work fewer hours, and that’s not included in the pay gap.

The second linked article discusses median weekly earnings. It has figures of $1320 for men and $1055 for women, a 25% gap.  The increase in median weekly earnings for the population as a whole was 8.8%, which is the only whole-population percentage increase in either article, so it seems like it should be the basis for “we’re all earning nearly 10% more than last year”.  If you dismiss the ‘all’ as journalistic license, and rephrase as “a typical person” then I suppose 8.8% is nearly 10%, but it’s further from 10% than 9.2% is.

Given how different the gender pay gap is when you compute it in different ways, you might  wonder if international comparisons use StatsNZ’s preferred definition. They don’t.  The OECD restricts to people working full time:  hourly and weekly earnings will have the same gaps, but you’ll miss out the pay gap due to more women working in part time jobs.  By the OECD definition, the New Zealand gender pay gap is 6.7% for employees and 30.8% for self-employed people.

The article linked to the tweet is about a campaign to make large businesses disclose their gender pay gap.  I’m fairly generally in favour of this sort of transparency, but when major news outlets seem to not understand or not care what the number means, publicising it might not be as effective as you’d like.