Posts filed under Probability (66)

December 13, 2011

Below the margin of error

“a policy which recognises that individuals are the owners of their own lives, and which probably has the potential to win broad support at a time when they’re polling below the margin of error” – NZ Classic Liberal

“Radio Rhema, Inferno, Solid Gold, Radio Live, Sunday News,  and Herald On Sunday all rate below the margin of error” – Greater Queenstown/Arrowtown Media Survey

“Whether our party does well, or remains mired below the margin of error, there is little doubt that libertarian ideas are slowly diffusing into the public consciousness.” – Sean Fitzpatrick, Libertarianz

“Given that ACT was last polling below the margin of error, their opinions, flattering or otherwise, hardly seem likely to sway the result.” – The Northland Age.

“Look at Huntsman running below 2 percent. He is running below the margin of error. That’s how bad he’s doing. He may actually have zero or owe somebody votes.” – Dean Obeidallah

Dean Obeidallah is a professional comedian, and he knows it’s a joke.  The others seem to be serious, though they may just mean “very small” rather than anything more precise.

Careful pollers refer to the uncertainty margin they routinely quote as the “maximum margin of error”.  Unfortunately the first word gets left off by most people who quote the results.   The maximum margin of error in a poll is the margin of error in an estimate of 50%.   That’s fine for the major parties, but if you want to know how many people support Winston Peters, or how many believe they have been abducted by aliens, you need a different formula.

Since proportions can’t be negative, and since any non-zero percentage in a poll implies a non-zero percentage in the population, the uncertainty must be smaller for percentages near zero or one hundred.  The uncertainty range must also be asymmetric: a 1% result can’t overestimate the truth by more than 1%, but it could underestimate the truth by more than 1%.

The graph shows the upper (blue) and lower (orange) margins of error for percentages from 0 to 100% in a poll of 1000 people, the size that Colmar Brunton typically uses.  Over the range from about 20% to 80% the curve is pretty flat, and using the maximum margin of error is a good approximation.  For values less than 10% or more than 90% we need a better rule of thumb.

Some rough approximations that might be useful:

  • At 10%, the margin of error is about two-thirds of the maximum
  • At 5%, the crucial MMP threshold, the margin of error is about half the maximum
  • For percentages greater than zero but less than the maximum margin of error, the relative margin of error is roughly 50%
  • If the percentage is zero there isn’t any margin of error downwards, but the upper margin of error is 3 divided by the sample size (eg 3/1000=0.3% for a sample of 1000).

The first two of these rules of thumb come from the formula for the variance  of a proportion p, which is p(1-p)/n for a sample size of n. The maximum margin of error is the square root of 1/n, so we can work out n easily.

The last rule is the famous rule of three: if you see none of something, the upper bound for the proportion is the same as the estimate if you had seen three of them.

The third rule is a rough approximation based on looking at some numbers, and is less accurate than the others.

November 23, 2011

Voting method simulator

Since the election is rapidly approaching, it’s probably a good idea to repost the link to the voting simulator.  Created by Geoffrey Pritchard and Mark Wilson, this allows you to put in proportions of votes and estimate the likely makeup of Parliament under MMP and the four alternatives that are being proposed. There’s also some discussion and a Frequently Asked Questions list.

 

November 10, 2011

Non-amazing twin coincidence (updated)

Stuff is reporting, in their Oddstuff section, a story about twins who will turn 11 tomorrow, 11-11-11.  How odd is this?

There are about 64000 births per year in New Zealand, or about 175 per day.  The rate of twin births is somewhere between 1 in 60 and 1 in 100, so on an average day, such as 11-11-00, there will be about two pairs of twins born.  So we’d expect two pairs of Kiwi twins to turn 11 on 11-11-11. You might wonder what the other pair was doing.

If you read the story, though, you see it’s actually from the United States: Madison, Wisconsin.  There were more than 50,000 pairs of  twins born in the US in 2000, so we’d expect about 136 pairs turning 11 tomorrow.

There must be lots of community papers across the world reporting on a pair of local twins turning 11 tomorrow.  The interesting question  is how the Overman twins got their story on to the Associated Press wire and into 181 (and counting) newspapers around the world, and how the same mechanisms are used on stories that aren’t just harmless fluff.

Updated:  Now there’s a local pair of twins, one of  whom is quoted as saying “There are probably only one or two [sets of twins] in the world turning 11 on that date.” 

Updated again: You’d expect there to be several sets of birthday triplets out there somewhere in the industrialized world, and one set have shown up in Windsor (Canada). Unfortunately, the story also spends a lot of time with Uri Geller, trying to get Deep Significance out of the date.

October 20, 2011

The use of Bayes’ Theorem in jeopardy in the United Kingdom?

A number of my colleagues have sent me this link from British newspaper The Guardian, and asked me to comment. In some sense I have done this. I am a signatory to an editorial published in the journal Science and Justice which protests the law lords’ ruling.

The Guardian article refers to a Court of Appeal ruling in the United Kingdom referred to as R v T. The original charge against Mr. T. is that of murder and, given the successful appeal, his name is suppressed. The nature of the appeal relates to whether an expert is permitted to use likelihood ratios in provision of evaluative opinion, whether an evaluative opinion based on an expert’s experience is permissible, and whether it is necessary for an expert to set out in a report the factors on which evaluative opinion based.

It is worthwhile noting before we proceed that to judge a case solely on one aspect of the whole trial is dangerous. Most trials are complex affairs with many pieces of evidence, and much more testimony that the small aspects we concentrate on here.

The issue of concern to members of the forensic community is the following part of the ruling:

In the light of the strong criticism by this court in the 1990s of using Bayes theorem before the jury in cases where there was no reliable statistical evidence, the practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice.

The practice of using likelihood ratios was justified as producing “balance, logic, robustness and transparency”, as we have set out at [54]. In our view, their use in this case was plainly not transparent. Although it was Mr Ryder’s evidence (which we accept), that he arrived at his opinion through experience, it would be difficult to see how an opinion of footwear marks arrived at through the application of a formula could be described as “logical”, or “balanced” or “robust”, when the data are as uncertain as we have set out and could produce such different results.

A Bayesian, or likelihood ratio (LR) approach to evidence interpretation, is a mathematical embodiment of three principles of evidence interpretation given by Ian Evett and Bruce Weir in their book Interpreting DNA Evidence: Statistical Genetics for Forensic Scientist. Sinauer, Sunderland, MA 1998. These principles are

  1. To evaluate the uncertainty of any given proposition it is necessary to consider at least one alternative proposition
  2. Scientific interpretation is based on questions of the kind “What is the probability of the evidence given the proposition?”
  3. Scientific interpretation is conditioned not only by the competing propositions, but also by the framework of circumstances within which they are to be evaluated

The likelihood ratio is the central part of the odds form of Bayes’ Theorem. That is
Bayes' Theorem

The likelihood ratio gives the ratio of the probability of the evidence given the prosecution hypothesis to the probability of the evidence given the defense hypothesis. It is favoured by members of my community because it allows the expert to comment solely on the evidence, which is all the court has asked her or him to do.

The basis for the appeal in R v T was that the forensic scientist, Mr Ryder, in the first instance computed a likelihood ratio, but did not explicitly tell the court he had done so. In the second instance, there was also criticism that the data needed to evaluate the LR was not available.

Mr Ryder considered four factors in his evaluation of the evidence. These were the pattern, the size, the wear and the damage.

The sole pattern is usually the most obvious feature of a shoe mark or impression. Patterns are generally distinct between manufacturers and to a lesser extent between different shoes that a manufacturer makes. Mr Ryder considered the probability of the evidence (the fact that the shoe impression “matches” the impression left by the defendant’s shoe) if it indeed was his shoe that left it. It is reasonable to assume that this probability is one or close to one. If the defendant’s shoe did not leave the mark, then we need a way of evaluating the probability of a “adventitious” match. That is, what’s the chance that the defendant’s shoe just happened to match by sheer bad luck alone? A reasonable estimate of this probability is the frequency of the pattern in the relevant population. Mr Ryder used a database of shoe pattern impressions found at crime scenes. Given that this mark was found at a crime scene this seems a reasonable population to consider. In this database the pattern was very common with a frequency of 0.2. The defense made much stock of the fact that the database represented only a tiny fraction of the shoes produced in the UK in a year (0.00006 per cent), and therefore it was not comprehensive enough to make the evaluation. In fact, the defense had done its own calculation which was much more damning for their client. Using the 0.2 frequency gives a LR of 5. That is, the evidence is 5 times more likely if Mr T.’s shoe left the mark rather than a shoe of a random member of the population.

The shoe size is also a commonly used feature in footwear examination. The shoe impression was judged to be size 11. Again the probability of the evidence if Mr T.’s shoe left the mark was judged to be one. It is hard to work out exactly what Mr Ryder did from the ruling, because a ruling is the judges’ recollection of proceedings, which is not actually an accurate record of what may, or may not, have been said. According to the ruling, Mr Ryder used a different database to assess the frequency of size. He estimated this to be 3%. The judges incorrectly equate this to 0.333, instead of 0.03 which would lead to an LR of 33.3. Mr Ryder used a “more conservative” figure to reflect to some uncertainty in size determination to 0.1, giving an LR of 10.

Wear on shoes can be different between different people. Take a look at the soles of your shoes and those of a friend. They will probably be different. To evaluate the LR, Mr Ryder considered that the wear on the trainers. He felt could exclude half of the trainers of this pattern type and approximate size/configuration. He therefore calculated the likelihood ratio for wear as 1/0.5 or 2. Note here that Mr Ryder appears to have calculated the probability of wear given pattern and size.

Finally, Mr Ryder considered the damage to the shoes. Little nicks and cuts accumulate on shoes over time and can be quite distinctive. Mr Ryder felt he could exclude very few pairs of shoes that could not previously have been excluded by the other factors. That is the defendant’s shoes were no more, or less, likely to have left the mark than any other pair in the database that had the same pattern, size and wear features. Therefore therefore calculated the likelihood ratio for damage as 1.

The overall LR was calculated by multiplying the four LRs together. This is acceptable if either the features were independent, or the appropriate conditional probabilities were considered. This multiplication gave an LR of 100, and that figure was converted using a “verbal scale” into the statement “the evidence provides moderate support for the proposition that the defendant’s shoe left the mark.” Verbal scales are used by many forensic agencies who employ an LR approach because they are “more easily understood” by the jury and the court.

The appeal judges ruled that this statement, without the explicit inclusion of information explaining that it was based on an LR, was misleading. Furthermore, they ruled that the data used to calculate the LR was insufficient. I, and many of my colleagues, disagree with this conclusion.

So what are the consequences of this ruling? It remains to be seen. In the first instance I think it will be an opening shot for many defense cases in the same way that they try to take down the LR because it is “based on biased Bayesian reasoning.” I do think that it will force forensic agencies to be more open about their calculations, but I might add that Mr Ryder didn’t seek to conceal anything from the court. He was simply following the guidelines set out by the Association of Footwear, Tool marks, and Firearms Examiners guidelines.

It would be very foolish of the courts to dismiss the Bayesian approach. After all, Bayes’ Theorem simply says (in mathematical notation) that you should update your belief about the hypotheses based on the evidence. No judge would argue that against that.

September 22, 2011

Harder to win big in the lottery

The ‘Big Wednesday’ lottery has moved from 6 balls out of 45 to 6 balls out of 50, which reduces even further the chance of getting the division 1 prize.  To win division 1, you need 6 balls correct out of 6, plus a correct coin toss.  There are 8,145,060 ways to choose 6 balls out of 45, and about twice as many ways, 15,890,700, to choose 6 balls out of 50.  Adding in the coin toss halves the chance of winning: the chance of winning per ‘line’ used to be 1 in 16,290,120 and is now 1 in 31,781,400.   For a minimum $4 ticket, which gives 4 ‘lines’, the chance of a division 1 prize was 1 in 4,072,530 and is now 1 in 7,945,350.

One back-of-the-envelope way to get roughly the correct impact of the change is to note that the chance of matching a given ball has gone down about 10%: from 1/45 to 1/50.  Multiplying 90% by itself six times says that the chance of winning is 53% of what it was, a very good approximation to the actual ratio, which is 51%.  The Dominion Post had the correct change, but the computations they report seem to have the effect of the coin toss backwards, so all their probabilities are overly optimistic by a factor of four.

Of course, the other way to look at it is that your chance of not winning division 1 with a $4 ticket has gone from 99.99998% to 99.99999%. Hardly seems worth mentioning. (more…)

September 5, 2011

Was Paul the Octopus Lucky or Skilful (and how about Richie McCow)?

Guest post by Tony Cooper

Paul The Octopus is famous for picking the winner in 8 out of 8 games at the FIFA World Cup in 2010. How did he achieve this amazing feat? Was he skilled or was he lucky?

To get 8 out of 8 games right where each game is a 50-50 guess the probability is 0.5 x 0.5 x 0.5 x 0.5 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0039 (or one chance in 256). This seems too incredible. An octopus can’t be that good.

Where is the flaw in this probabilistic reasoning?

The answer is that the probability that Paul got a game right was not 0.5 but more. Much more. In fact the probability was much closer to one. Here is the explanation:

Paul, a German octopus, was only used to pick German games, usually picked the German team, and the German team usually won. So the chance that Paul picked the winner was more than 0.5 for each game. That accounts for 5 of the 8 games.

What about the games that Germany lost and the game where Germany didn’t play? Let’s take a guess.

This tournament was not Paul’s first time at playing the game. He had learned previously that there was always food under the German flag. Paul – with some ability to distinguish the flags of different countries – usually went for the German black, red, and yellow. He might have had monochrome eyesight which is why he picked Serbia as a winner in the game against Germany because in monochrome the Serbian and German flags look similar.

For the final Germany wasn’t playing so Paul went for the most German looking flag – that of Spain (red, yellow, red). He seems to have a preference for that flag since he also predicted the win of Spain over Germany in the semi-finals. All flags picked by Paul had horizontal stripes.

So Paul was lucky but not as lucky as the 1 in 256 chance suggests. His main luck was that his team was one of the best teams in the tournament and that some of the other good teams had similar flags.

Will Richie McCow be successful at picking the winner of the All Black games in the Rugby World Cup? Possibly – as long as he keeps picking the All Blacks and the All Blacks keep winning. Will the All Blacks keep winning? That’s a story for a later article.

Tony Cooper, formerly of the Applied Mathematics Division of the DSIR, is a Quantitative Analyst with Double-Digit Numerics of Auckland. He consults mainly in the investment, finance, and electricity industries. His research interests include risk and volatility prediction, alpha generation, data mining, statistical learning, and time series analysis.