Posts written by Thomas Lumley (2566)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

December 6, 2022

Briefly

  • I’ve often complained about misleading bar graphs in reporting electoral opinion polls. 1News just punted on the whole issue with this:
  • The cost of the Meola Road rebuild, $47.5 million, has been inaccurately portrayed as the cost of the bike lane that’s a minor component of it. Twitter user @ArcCyclist got the actual breakdown from the Council:

    While I’m at it, I do want to note one way it’s a bad table: the cycleway number is given to whole dollars, with everything else given in cents, so it looks even smaller than it really is. You usually don’t want to delete trailing zeroes in a table.
  • The ESR Covid wastewater dashboard is now at poops.nz. Yes, really.
  • There’s a new “technical report for future UK Chief Medical Officers, Government Chief Scientific Advisers, National Medical Directors and public health leaders in a pandemic” from the UK. Even if you aren’t among that exalted company, some of the information may be useful to public citizens as well
  • The Ministry of Health is seeking public comment on something it wrote about ‘precision health’. There might be StatsChat readers who have reckons.
  • Eric Crampton notes that cost-benefit ratios for transport projects are defined in an idiosyncratic way that makes them hard to compare either with each other or with non-transport projects.
  • The first drug to convincingly delay Type I diabetes onset has been approved. The average benefit is about two years, and the treatment will be marketed at US$200,000.  Cost-effectiveness research suggests this is way more than it’s worth for most people, even in the US where insulin for Type I diabetes is very expensive.
November 28, 2022

99.44% pure

From the Guardian: Computer says there is a 80.58% probability painting is a real Renoir. The story goes on to say Dr Carina Popovici, Art Recognition’s CEO, believes that this ability to put a number on the degree of uncertainty is important.

It’s definitely valuable to put a number on the degree of uncertainty. What’s much less clear is that it’s valuable to put a number on the uncertainty to four-digit precision.   Let’s think about what it would take to be that precise.

If the 80.58% number was estimated from a proportion of observed data in some sense, quoting it to four digits would only make sense if the uncertainty was less than about 0.05%.  A standard error of 0.05% would need a sample size of more than five hundred million.

Another way you can get an estimate with high precision is including subjective expert opinion, which would be entirely appropriate in a context like this. There’s no limit to how precise this can be for the person whose opinion it is — you believe exactly what you believe — but there are very strong limits on how precise it can realistically be as a guide to others.  If the computer isn’t the one buying the Renoir, other people probably shouldn’t care about its opinion to more than one or two digits accuracy.

Sometimes when you come up with an estimate you want to quote it to higher precision than is directly useful — lots of statistical software, including some I write, quotes four or more digits in the default output. This allows rounding to happen closer to the point of use, such as before it’s in a headline in the mainstream media.

November 18, 2022

How many Covid cases?

From Hannah Martin at Stuff: Only 35% of Covid cases being reported, ministry says, after earlier saying it was 75%

The ministry’s latest Trends and Insights report, released on Monday, said “approximately three quarters of infections are being reported as cases”, based on wastewater testing.

However, it has since said that “based on updated wastewater methodology”, about 35% of infections were reported as cases as of the week to November 2.

This is a straightforward loss during communication:  the 75% was an estimate of how much the reporting had changed since the first Omicron peak, but it got into the Trends and Insights report as an absolute rate.  Dion O’Neale is quoted further down the story explaining this.

For future reference, it’s worth looking at what we can and can’t estimate well from various sources of information we might have.

The wastewater data has the advantage of including everyone in a set of cities and towns, adding up to the majority of the country; everybody poops. It has the disadvantage of not directly measuring cases or infections.  The wastewater data tells us how many Covid viral fragments are in the wastewater.  How that relates to infections depends on how many viral fragments each person sheds and how many of these make it intact to the collection point.  This isn’t known — it’s probably different for different people, and might depend on vaccination and previous infections and which variant you have and age and who knows what else. However, the population average probably changes slowly over time, so if the number of viral fragments is going up this week, the number of active infections is probably going up, and  if the number of viral fragments is going down, the number of active infections is probably going down.

Using the wastewater data, we can see that the ratio of reported cases to wastewater viral fragments has been going down slowly since the first Omicron peak.  We’ve got a lot of other reasons to think testing and reporting is going down, so that’s a good explanation. It’s especially a good explanation because most of the other reasons for a change (eg, less viral shedding in second infections) would make the ratio go up instead.  So, with the ratio of reported cases to viral fragments going down by 25% it makes sense to estimate that the ratio of reported cases to infections has gone down 25%.

Now all we need is to know what the reporting rate was at the peak. Which we don’t know. It couldn’t have been much higher than 60%, because some infections won’t have been symptomatic and some tests will have been false negatives.  If it was 60%, it’s down to roughly 40% now.  If it was lower than that at the peak, it’s lower than 40% now.  You could likely get somewhat better guesses by combining the epidemic models and the wastewater data, but it’s always going to be difficult.

You might think that hospitalisation and death data are less subject to under-reporting. This is true, but the proportion of infections leading to hospitalisation is (happily) going down due to vaccination and prior infection, and the proportion leading to death is (happily) going down even more due to better treatment.  On top of those changes, hospitalisation and death lag infection by quite a long time. The hospitalisation rate and death rate are directly important for policy, but they aren’t good indicators of current infections.

So, we’re a bit stuck. We can detect increases or decreases in infections fairly reliably with wastewater data, but absolute numbers are hard.  This is even more true for other diseases — in the future, there will hopefully be wastewater monitoring for influenza and maybe RSV, where we expect the case reporting rate to be massively lower than it is for Covid.

To get good absolute numbers we need a measurement of the actual infection rate in a random sample of people. That’s planned — originally for July 2022, but the timetable keeps slipping. A prevalence survey is a valuable complement to the wastewater data; it gives absolute numbers that can be used to calibrate the more precise and geographically detailed relative numbers from the wastewater.  Until we have a prevalence survey, the ESR dashboard is a good way to get a feeling for whether Covid infections are going up or down, and how fast.

November 16, 2022

Is Roy Morgan weird yet?

Some years ago, at the behest of Kiwi Nerd Twitter, I looked at whether the Roy Morgan poll results varied more than those from other organisations, and concluded that they didn’t. It was just that Roy Morgan published polls more often. They had a larger number of surprising results because they had a larger number of results.  Kiwi Nerd Twitter has come back, asking for a repeat.

I’m going to do analyses of two ways of measuring weirdness, for the major and semi-major parties. All the data comes from Wikipedia’s “Opinion polling for the next NZ Election“, so it runs from the last election to now.  First, I’ll look at National.

The first analysis is to look at departures from the general trend.  The general trend for National (from a spline smoother, fitted in R’s mgcv package, in a model that also has organisation effects) looks like this:

Support was low; it went up.

I subtracted off the trend, and scaled the departures by the margin of error (not the maximum margin of error). Here they are, split up by polling organisation

The other analysis I did was to look at poll-to-poll changes, without any modelling of trend. The units for these are just percentage points.

Next, the same things for Green Party support: departures from their overall trend

And poll-to-poll differences

For ACT:

And finally for Labour

 

So, it’s complicated. The differences are mostly not huge, but for the Greens and Labour there does seem to be more variability in the Roy Morgan results. For National there isn’t, and probably not for ACT.  The Curia polls are also more variable for Green but not for Labour.  I think this makes Roy Morgan less weird than people usually say, but there does seem to be something there.

As an additional note, the trend models also confirm that the variance of poll results is about twice what you’d expect from a simple sampling model. This means the margin of error will be about 1.4 times what the pollers traditionally claim: about 4.5% near 50% and about 1% near the MMP threshold of 5%

November 5, 2022

Winston First?

An ongoing theme of StatsChat is that single political polls aren’t a great source of information, and that you need to combine them. A case in point: this piece at Stuff describing a new Horizon poll.  The headline is Winston Peters returns to kingmaker position in new political poll, and the poll has NZ First on 6.75%.  My second-favourite NZ poll aggregator, Wikipedia, shows other recent polls, where the public results from Curia, Roy Morgan, and Kantar were 2.1%, 1%, and 3% and a leaked result from Talbot Mills was 4%.  It’s possible that this shows a real and massive jump over the past couple of weeks. Stranger things do happen in politics — but not much stranger and not all that often. It’s quite likely that it’s just some sort of blip and doesn’t mean much.

Stuff does add “The poll had a margin of error of 3.2%, meaning NZ First’s crossing the 5% threshold was within the margin of error,”  but that’s the wrong caveat.   The 3.2% margin of error is more strictly called the ‘maximum margin of error’, because it’s the margin of error for proportions near 50%, which is larger than at, say, 5%.  I’ve written before about calculating the corresponding margin of error for minor parties.

In this case, under the pure mathematical sampling approximations used to get 3.2%, a 95% uncertainty interval for NZ First’s true support would go from 5.2% to 8.5%. If we only worried about sampling error, NZ First would be fairly clearly above the 5% threshold.  The problem is that the mathematical sampling error  is typically an underestimate of total survey error — and when you get a very surprising result, it’s sensible to consider that you might possibly be out on the fringes of the total survey error.  Or not. We will find out soon.

 

 

 

 

 

October 20, 2022

Bus cancellations

The friendly StatsChat busbots have been tracking cancellations as well as delays: for the past month in Auckland and longer in Wellington.  Here’s a summary of the Auckland cancellations

They seem to be up again, which might not be entirely unconnected with the current increase in Covid cases. Perhaps more to the point, that’s a lot of missing buses.

September 19, 2022

After the Great Resignation

This story is a month old now, but Stuff served it up to me again, and I didn’t write about it earlier. The headline is Workers are discovering the ‘Great Regret’ of quitting their jobs, and the key data-based line is

In the United States, a survey of more than 15,000 workers by job-search platform Joblist found 26 percent of workers who quit during the Great Resignation regret their decision.

I’ve edited that quote so that the link works, which it doesn’t on Stuff or newsroom. If you follow the link, you find that is basically what Joblist says in its writeup of the survey.  If you are naturally suspicious or nerdy enough to read the Methodology section at the bottom of the page, though, it looks a bit different

We surveyed 628 job seekers who quit their previous job about why they quit and whether they regret their decision.

So, the 26% is of 628 people, not more than 15,000.  More specifically, it’s 628 job-seekers. The target population doesn’t appear to include people who already had a new job or who weren’t looking for one — two groups that you’d expect to have fewer regrets.

The Methodology section doesn’t say how they chose the people to survey, though it does say This data has not been weighted, and it comes with some limitations. At best, that suggests a survey with no attempt to compensate for non-response. At worst, it could be a bogus self-selected straw poll.

September 17, 2022

Briefly

  • From Radio NZ’s series on the lotto: “When contacted by RNZ, Lotto said it would now no longer claim that there were lucky stores. “We recently carried out a piece of work looking at the use of the word ‘luck’ in relation to our products and as a result of this work have decided we will not in the future put ‘lucky stores’ or ‘lucky regions’ in press releases,” head of corporate communications Lucy Fullarton said.”.  As StatsChat readers will remember, we’ve been attacking this use of “lucky” for a while.
  • On the other hand, Lotto doesn’t need to specifically make these claims any more, since they’re already  well known. For example, see today’s Herald, “Kiwis are rushing into lucky Lotto stores ahead of tonight’s $20 million jackpot draw”, naming the same Hastings pharmacy as the Radio NZ story did.
  • Good article by Jamie Morton in the Herald on interesting clinical trials in New Zealand.
  • The UK Office of National Statistics has a life expectancy calculator — if, to pick an example almost at random, you wanted to find out how long a 73-year old man would be expected to live
  • There’s a claim out there that the median book only sells twelve copies.  As you’d expect, it’s more complicated than that
September 12, 2022

New euphemism for ‘bogus poll’

Stuff has a headline Tory Whanau clear leader in straw poll for Wellington mayoralty.

This is bad. Election polling is valuable because it gives people some feeling for what the rest of the electorate thinks, solving the Pauline Kael problem

 “I live in a rather special world. I only know one person who voted for Nixon. Where they are I don’t know. They’re outside my ken. But sometimes when I’m in a theater I can feel them.” — Pauline Kael, NYT film critic

Polling also gives newspapers something to write about that’s novel and at least a bit informative.

With good polling, people have an accurate idea of public opinion; with poor polling they have an inaccurate idea. With self-selected bogus polling they have no idea at all.  The Dominion Post even tweeted about this story

The poll, while unashamedly unscientific, points to Wellington’s next mayor being relative unknown Tory Whanau less than a week from voting papers going out.

The poll is incapable of pointing to anything, so it doesn’t point to Tory Whanau being the next mayor, however desirable that might be.

Back in the early days of StatsChat, when John Key debated David Cunliffe, we showed the results of three useless self-selected bogus polls about who had done better: Newstalk ZB was 63% in favour of Cunliffe; TVNZ was 61% in favour of Key; the Herald was a tie.

If a poll like this gets the right answer, it’s basically an accident. There’s no excuse for headlining the results as if they meant something.

September 2, 2022

Showing MPs expenses

MPs expenses paid by the Parliamentary Service were published recently for the quarter ending June 30. Here are the averages by party (click to embiggen)

This is a terrible presentation of the data, though not the worst one I saw.  Simple bar charts like this are sometimes useful, but can be very misleading. Here’s a dotplot, showing the individual data points

In this version, you should immediately notice something. There are a lot of Labour MPs who aren’t getting any expenses! This might then prompt  you to go back to the website and read the “Understanding members’ expenses” page, and find out that these numbers don’t include Ministers’ expenses.  Most of the ministers are in the Labour Party (for some reason) and that will pull down the average.

You might then remove the ministers and compare averages again, but it’s not obvious that’s a good comparison. The outlying blue dot at the top there is Chris Luxon. It seems reasonable that the leader of the opposition could have more expenses than a random MP.  This then raises the question of whether comparing averages for non-Minister MPs between the government and opposition is sensible in general.  There are other questions: for example,  travel expenses will depend on where you live as well as where you go.  While we’re looking at outliers, I should also note the outlying blue dot at the bottom. That’s Sam Uffindell. He was elected in a by-election on June 18, so he didn’t have much time to rack up expenses.