June 23, 2014

Undecided?

My attention was drawn on Twitter to this post at The Political Scientist arguing that the election poll reporting is misleading because they don’t report the results for the relatively popular “Undecided” party.  The post is making a good point, but there are two things I want to comment on. Actually, three things. The zeroth thing is that the post contains the numbers, but only as screenshots, not as anything useful.

The first point is that the post uses correlation coefficients to do everything, and these really aren’t fit for purpose. The value of correlation coefficients is that they summarise the (linear part of the) relationship between two variables in a way that doesn’t involve the units of measurement or the direction of effect (if any). Those are bugs, not features, in this analysis. The question is how the other party preferences have changed with changes in the ‘Undecided’ preference — how many extra respondents picked Labour, say, for each extra respondent who gave a preference. That sort of question is answered  (to a straight-line approximation) by regression coefficients, not correlation coefficients.

When I do a set of linear regressions, I estimate that changes in the Undecided vote over the past couple of years have split approximately  70:20:3.5:6.5 between Labour:National:Greens:NZFirst.  That confirms the general conclusion in the post: most of the change in Undecided seems to have come from  Labour. You can do the regressions the other way around and ask where (net) voters leaving Labour have gone, and find that they overwhelmingly seem to have gone to Undecided.

What can we conclude from this? The conclusion is pretty limited because of the small number of polls (9) and the fact that we don’t actually have data on switching for any individuals. You could fit the data just as well by saying that Labour voters have switched to National and National voters have switched to Undecided by the same amount — this produces the same counts, but has different political implications. Since the trends have basically been a straight line over this period it’s fairly easy to get alternative explanations — if there had been more polls and more up-and-down variation the alternative explanations would be more strained.

The other limitation in conclusions is illustrated by the conclusion of the post

There’s a very clear story in these two correlations: Put simply, as the decided vote goes up so does the reported percentage vote for the Labour Party.

Conversely, as the decided vote goes up, the reported percentage vote for the National party tends to go down.

The closer the election draws the more likely it is that people will make a decision.

But then there’s one more step – getting people to put that decision into action and actually vote.

We simply don’t have data on what happens when the decided vote goes up — it has been going down over this period — so that can’t be the story. Even if we did have data on the decided vote going up, and even if we stipulated that people are more likely to come to a decision near the election, we still wouldn’t have a clear story. If it’s true that people tend to come to a decision near the election, this means the reason for changes in the undecided vote will be different near an election than far from an election. If the reasons for the changes are different, we can’t have much faith that the relationships between the changes will stay the same.

The data provide weak evidence that Labour has lost support to ‘Undecided’ rather than to National over the past couple of years, which should be encouraging to them. In the current form, the data don’t really provide any evidence for extrapolation to the election.

 

[here’s the re-typed count of preferences data, rounded to the nearest integer]

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Megan Pledger

    Andrew Robertson said to me here. “Don’t forget that the results are also filtered on likelihood to vote, which differs by Party Support.”

    So it could be that the Labour people are not becoming undecided but deciding not to vote. It could be a feedback loop – as Labour goes down in the polls more Labour people choose to not vote next time so Labour goes down in the polls.

    10 years ago

  • avatar
    Megan Pledger

    It’s not clear if the “undecided” count is the raw count or the weighted count of undecideds/not voting.

    I presume it’s the raw count – if the raw count of undecideds goes up than it may not actually represent an increase in undecideds at the population level e.g. if one younger man become decided and two older women becomes undecided then the raw count of people who are undecided goes up but the number of people they represent in the population may mean the proportion of decideds remain unchanged i.e. maybe the young male has (a normalised) weight of 2 and the older females have (a normalised) weight of 1.

    Realistically, I don’t see big changes in the demographics of the undecideds although the advent I/M/IM of could shake that.

    10 years ago

    • avatar
      Thomas Lumley

      No, it’s not clear.

      I don’t think it matters for this particular analysis, but it would be nice to know.

      10 years ago

  • avatar

    In the CB poll it’s the weighted percentage. Pretty much all results are weighted except for the reported bases.

    10 years ago