Posts filed under Polls (132)

November 27, 2011

It’s over

November 23, 2011

Voting method simulator

Since the election is rapidly approaching, it’s probably a good idea to repost the link to the voting simulator.  Created by Geoffrey Pritchard and Mark Wilson, this allows you to put in proportions of votes and estimate the likely makeup of Parliament under MMP and the four alternatives that are being proposed. There’s also some discussion and a Frequently Asked Questions list.

 

November 8, 2011

Political poll with sample size of 47 makes headlines

David Farrar of Kiwi Blog criticises a story in the Herald which says:

John Banks has some support in the wealthy suburb of Remuera, but is less popular on the liberal fringes of the Epsom electorate, according to a Herald street survey.

A poll of 47 Epsom voters yesterday found the National candidate ahead of Act’s Mr Banks by 22 votes to 20.

Farrar correctly points out that the poll is in no way random (i.e. is not scientific), and goes on to say:

But even if you overlook the fact it is a street poll, the sample size is ridiculously low. The margin of error is 14.7%! I generally regard 300 as the minimum acceptable for an electorate poll. That gives a 5.8% margin of error. A sample of 47 is close to useless.

November 4, 2011

Election polls: filling in the blank

It’s a familiar phenomenon that a party leader can be more popular, or less popular, than the party they represent. For example, Labour is currently more popular than Phil Goff.

The problem is especially difficult to handle in the US elections. At the moment, we pretty much know that Barack Obama will be the Democrats’ presidential candidate next year.  We don’t know who the Republicans will pick.  You could run a poll asking “Would you vote for Obama or for a Republican opponent”, or you could pick one of the current candidates for the Republican nomination and ask “Obama vs Cain” or “Obama vs Romney”.  It turns out to matter.

In the current polls, President Obama loses to Generic Republic Opponent by about 3%, but beats everyone in the current Republican field. The only actual Republican who comes close in current support to Fill In The Blank is Mitt Romney, who is about 2% behind Obama.    We’ll have to wait until February to see what happens when the Republican nominee is chosen.

 

November 3, 2011

For whom the belle polls

TV3 on Tuesday reported that “early results in a poll suggest Labour’s Phil Goff won the debate with Prime Minister John Key last night.” The poll was by RadioLIVE and Horizon.

The TV piece concluded by lambasting a recent One News text poll, saying: “A One News text poll giving the debate to Mr Key 61-39 has been widely discredited, since it cost 75c to vote.”

This text poll should be lambasted if it is used to make inference about the opinions of the population of eligible willing voters. Self-selection is the major problem here: those who can be bothered have selected themselves.

The problem: there is no way of ascertaining that this sample of people is a representative sample of willing voters. Only the interested and motivated, who text message, have answered and, clearly, the pollsters do not have information on the not-so-interested and motivated non-texters.

The industry standard is to randomly select from all eligible willing voters and to adjust for non-response.  The initial selection is random and non-response is reduced as much as possible. This is to ensure the sample is as representative of the population as possible. The sample is collected via a sampling frame which, hopefully, is a comprehensive a list of the population you wish to talk to.  For CATI polls the sample frame is domestic landline (traditional) phone numbers.

With election polls, as landlineless voters have been in lower socio-economic groups which tend to have lower voter participation, this has not been so much of a problem.  The people we wish to talk to are eligible willing voters – so the polls have not been unduly biased by not including these landlineless people.

However, as people move away from landlines to mobile phones, CATI interviewing has been increasingly criticised. Hence alternatives have been developed, such as panel-polls and market prediction polls like IPredict – and the latter will be the subject of a post for another day.

But let’s go back to the Horizon panel poll mentioned above. It claims that it’s to be trusted as it has sampled from a large population of potential panellists who have been recruited and can win prizes for participation. The Horizon poll adjusts for any biases by reweighting the sample so that it’s more like the underlying New Zealand adult population – which is good practice in general.

However, the trouble is this large sampling frame of potential panellists have been self-selected. So who do they represent?

To illustrate, it’s hard to imagine people from more affluent areas feeling the need to get rewards for being on a panel. Also, you enrol via the internet and clearly this is biased towards IT-savvy people. Here the sampling frame is biased, with little or no known way to adjust for any biases bought about from this self-selection problem. They may be weighted to look like the population but they may be fundamentally different in their political outlook.

Panel polls are being increasingly used by market researchers and polling companies. With online panel polls it’s easier to obtain samples, collect information and transfer it, without all the bother involved in traditional polling techniques like CATI.

I believe the industry has been seduced by these features at the expense of representativeness – the bedrock of all inference. Until such time as we can ensure representativeness, I remain sceptical about any claims from panel polls.

I believe the much-maligned telephone (CATI) interviewing, which is by no means perfect, still remains the best of a bad lot.

November 1, 2011

The easy way out.

There are many ways that sophisticated pollsters can tilt the result of a survey: biased samples, carefully framed sets of questions, creative subsets of the data.

Based on a survey of protesters at Occupy Wall Street, pollster Doug Schoen wrote ” it comprises an unrepresentative segment of the electorate that believes in radical redistribution of wealth, civil disobedience and, in some instances, violence. . . .”.

The actual data from that very survey:

What would you like to see the Occupy Wall Street movement achieve? {Open Ended}
35% Influence the Democratic Party the way the Tea Party has influenced the GOP
4% Radical redistribution of wealth
5% Overhaul of tax system: replace income tax with flat tax
7% Direct Democracy
9% Engage & mobilize Progressives
9% Promote a national conversation
11% Break the two-party duopoly
4% Dissolution of our representative democracy/capitalist system
4% Single payer health care
4% Pull out of Afghanistan immediately
8% Not sure

Andrew Gelman says about this pol“as I like to remind students, the simplest way to lie with statistics is to just lie!”. 

October 25, 2011

All about election polls

November 26 is Election Day, and from now on, you’ll be getting election polls from all directions. So which ones can you trust?  The easy answer is: none of them.  However, some polls are worth more than others.

Assess their worth by asking these questions:

  • Who is commissioning the poll? Is this done by an objective organisation or is it done by those who have a vested interest? Have they been clear about any conflict of interest?
  • How have they collected the opinions of a representative sample of eligible voters? One of the cardinal sins of polls is to get people to select themselves (self-selection bias) to volunteer their opinions, like those ‘polls’ you see on newspaper websites. Here, you have no guarantee that the sample is representative of voters. “None of my mates down at the RSA vote that way, so all the polls are wrong” is a classic example of how self-selection  manifests itself.
  • How did they collect their sample? Any worthy pollster will have attempted to contact a random sample of voters via some mechanism that ensures that they have no idea who, beforehand, they will be able to contact.  One of the easiest ways is via computer-aided interviewing (CATI) of random household telephone numbers (landlines), typically sampled in proportion to geographical regions with a rural/urban split (usually called a stratified random sample). A random eligible voter needs to be selected from that household – and it won’t necessarily be the person who most often answers the phone! A random eligible voter is usually found by asking which of the household’s eligible voters had the most recent birthday and talking to that person.  But the fact that not all households have landlines is an increasing concern with CATI interviewing.  However, in the absence of any substantiated better technique, CATI interviewing remains the industry standard.
  • What about people who refuse to cooperate? This is called non-response. Any pollster should try to reduce this as much as possible by re-contacting households that did not answer the phone first time around, or, if the first call found the person with the most recent birthday wasn’t home, try to get hold of them.  If the voter still refuses, they become a ‘non-respondent’ and attempts should be made to re-weight the data so that this non-response effect is diminished. The catch is that the data is adjusted on the assumption that the respondents selected represented the opinion of a non-respondent on whom, by definition, we have no information. This is a big assumption that rarely gets verified. Any worthy polling company will mention non-response s and discuss how they attempt to adjust for them. Don’t trust any outfits that are not willing to discuss this!
  • Has the polling company asked reasonable, unambiguous questions? If the voters are confused by the question, their answers will be too. The pollsters need to state what questions have been asked and why. Any fool can ask questions – asking the right question is one of the most important skills in polling. Pollsters should openly supply detail on what they ask and how they ask it.
  • How can a sample of, say, 1000 randomly-selected voters represent the opinions of 3 million potential voters? This is one of the truly remarkable aspects of random sampling. The thing to realise is that whilst this a very small sub-sample of voters, provided they have been randomly selected, the precision of this estimate is determined by the amount of information you have collected, not the proportion of the total population (provided this sampling fraction is quite small e.g. 1000 out of 3 million).
  • What is the margin of error (MOE)?  It’s a measure of precision. It measures the price paid for not taking a complete census of the data, which happens once every three years on Election Day, which we call in statistical terms a population result. The MOE is based on behaviour of all similar possible poll results we could have selected (for a given level of confidence which is usually taken to be 95%). Once we know what that behaviour is (via probability theory and suitable approximations) we can then use the data that has been collected to make inference about the population that interests us. We know that 95% of all possible poll results plus or minus their MOE include the true unknown population value. Hence, we say we are 95% confident that a poll result contains the population value.
  • When we see quoted a MOE of 3.1% (from random sample of n=1000 eligible voters), how has it been calculated? It is, in fact, the maximum margin of error that could have been obtained for any political party. It is only really valid for parties that are close to the 50% mark (National and Labour are okay here, but it is irrelevant for, say, NZ First, whose support is closer to 5%). So if National is quoted a having a party vote of 56%, we are 95% confident that the true population value for National support is anywhere between 56% plus or minus 3.1% or about 53% to 59% – in this case, indicating a majority.
  • Saying that a party is below the margin of error is saying it has few people supporting it, and not much else. Its MOE will be much lower than the maximum MOE. For back-of-the-envelope calculations, the maximum MOE for a party is approximately =1/(square root of n), e.g. If n=1000 random voters  are sampled then  MOE  1/(square root of 1000) =1/31.62 =3.1%.
  • Comparing parties become somewhat more complicated.  If National are up, then no doubt Labour will be down. So to see if National has a lead on Labour, we have to adjust for this negative correlation. A rough rule of thumb for comparing parties sitting around 50% is to see if they differ by more than 2xMOE. So if Labour has 43% of the party vote and National 53% (with MOE = 3.1% from n=1000) we can see that this 10% lead is greater than 2×3.1=6.2% – indicating that we have evidence to believe that this lead of National is ‘real’, or statistically significant.
  • Note that any poll result only represents the opinion of those sampled at the place and time. As a week is a long time in politics, and most polls become obsolete very quickly. Note also that poll results now can affect poll results tomorrow, so these results are fluid, not fixed.

If you’re reading, watching or listening to poll results, be aware of their limitations. But note that although polls are fraught with difficulties, they remain useful. Any pollster who is open about the limitations of his or her methods is to be trusted over those who peddle certainty based on uncertain or biased information.

October 3, 2011

“Unexpected results of a new poll”?

Kiwiblog’s David Farrar has nominated 3 News for the most misleading story of the week in their reporting of a political poll because their story does not mention that the poll was only a sample of Maori voters, not a sample of all voters:

“Labour most popular party in new poll…

Labour leader Phil Goff will be clinging to the unexpected results of a new poll in which his party has picked up twice as much support as National.

But he is well behind John Key in the preferred prime minister stakes, according to the TVNZ Marae Investigates Digipoll, released today.

Labour’s on 38.4 percent support in the poll, followed by the Maori Party on 22.2 percent, while National’s on just 16.4 percent. That is in stark contrast to other media polls, which put National above 50 percent support, with Labour rating at 30 percent or less, and the Maori Party on around one percent support.

…The TVNZ poll interviewed 1002 respondents between August 19 and September 20, and has a margin of error of +/- 3.1 percent.”

The original press release from TVNZ does state this very clearly:

Full release of Digipoll Maori Voter Survey… The TVNZ Marae Investigates Digipoll is one the most established voter polls in NZ and often the only one to survey Maori voters in an election year.

In a further 3 News article they discuss a different poll and say that “the poll differs greatly to one released by TVNZ’s Marae Investigates earlier today” without explanation for the difference.

UPDATE: 3 News have now updated the headline to: “Labour most popular party among Maori” and added “The TVNZ Marae Investigates Digipoll surveyed Maori listed on both the general and Maori electoral rolls.”

3 News’ Chief Editor James Murray apologised on Kiwiblog:

“Got to put our hands up to a genuine mistake there. This was a story from our wire service, and we didn’t do our due diligence in fact-checking it.

We absolutely understand the importance of getting this right, and the story has now been corrected. My team have been told to be extra vigilant on poll stories in future and NZN have been informed of the error.

Apologies for anyone who may have been misled by this mistake.”

August 2, 2011

Question-wording effects in surveys

David Farrar at Kiwiblog provides some new local examples and discussion of question-wording effects in surveys, including the gender/pay issue and same sex marriage.

“With poll questions there is rarely a clearly “right” or “wrong” question. There can be a dozen different ways to ask a question. The important thing is that the poll results make it very clear the exact question that was asked, and that reporting of the results does the same.”

Read the post »

Casual inference

From the NZ Herald:

“The survey found almost 65 per cent of women believed they were paid less because of their gender. Just under 43 per cent of men agreed but 47 per cent didn’t.”

Unfortunately the Herald doesn’t tell us what the actual question was. Were people asked whether they, personally, were paid differently because of their gender, or whether women, on average, were paid less because of their gender?   In either case, my sympathies are with Women’s Affairs Minister Hekia Parata, who refused to offer her own answer to the “simplistic” poll question.

There are two statistical problems here. The first is what we mean by “because of their gender”.  After that’s settled, we have the problem of finding data to answer the question.  Inference about cause and effect from non-experimental data will always be hard because of both problems, but that’s what we’re here for.

Usually, when we say that income or health is worse because of some factor, we mean that if you could experimentally change the factor you would change health or income. We say high blood pressure causes strokes, and we mean that if you lower the blood pressure of a bunch of people, fewer of them will get strokes.  This isn’t possible for gender — not only can we not assign gender at random, we can’t even say what it would mean to do that.  Would a female Dan Carter be his sister Sarah, or Irene van Dyk? (more…)