Posts filed under Evidence (90)

July 3, 2012

Finding out if policies work

The UK Cabinet Office (with the help of Ben Goldacre and David Torgerson) has come out in favour of finding out whether new policies actually work:

Test, Learn, Adapt‘ is a paper which the Behavioural Insights Team is publishing in collaboration with Ben Goldacre, author of Bad Science, and David Torgerson, Director of the University of York Trials Unit. The paper argues that Randomised Controlled Trials (RCTs), which are now widely used in medicine, international development, and internet-based businesses, should be used much more extensively in public policy.

As we have pointed out before, lots of people come up with potentially good ideas for dietary interventions, crime prevention, reductions in drug use, and improved education, to name just a few targets.     The experience from medical research is that plausible, theoretically-sound, carefully thought-out treatment ideas mostly don’t work.  In other fields we don’t know because we haven’t looked.

April 9, 2012

‘Causal’ is not enough

Yesterday’s post about crime rates and liquor stores was tagged ‘correlation vs causation’, but it’s more complicated than that.  It’s not even clear what sort of causation is at stake.

I think we can all agree that being drunk, like being young and male, is a causal factor in violent crime. But that’s not the question.  There are two possible causal stories behind higher crime rates near liquor stores, or, more precisely, alcohol licenses.   These are truly causal alternatives to the skeptical argument that it’s actually (demand for) drinking that leads to alcohol licenses.

The weaker causal story is that people get drunk, and when they do, they are more likely to do it nearer to alcohol licenses.  That’s certainly the case for pubs and restaurants — if you buy beer from a pub, you are going to be drinking it at the pub  — and could be true for liquor stores as well.   This story would say that if you moved an alcohol license the crime would move, and if you shut down one place, the drunkenness and crime would relocate among the available options.  If this is true, it’s useful to local community groups wanting to improve local conditions, but it’s pretty much useless from a public health and safety viewpoint.

The stronger story is that people won’t drink if they have to go further to get alcohol, so that reducing the number of licenses will reduce drinking.   On this theory, reducing licenses could have a health and safety impact beyond just local redistribution of crime.

It’s not possible to distinguish these using the available data.  There’s good evidence that something like the first story holds for CCTV installation — it pushes crime out of the surveillance zone but doesn’t stop it.  And there’s some evidence that something like the second explanation works for stopping kids from smoking — adding inconvenience and cost has much more of an impact on them than on adults.

February 12, 2012

Thresholds and tolerances

The post on road deaths sparked off a bit of discussion in comments about whether there should be a `tolerance’ for prosecution for speeding.  Part of this is a statistical issue that’s even more important when it comes to setting environmental standards, but speeding is a familiar place to start.

A speed limit of 100km/h seems like a simple concept, but there are actually three numbers involved: the speed the car is actually going, the car’s speedometer reading, and a doppler radar reading in a speed camera or radar gun.  If these numbers were all the same there would be no problem, but they aren’t.   Worse still, the motorist knows the second number, the police know the third number, and no-one knows the actual speed.

So, what basis should the police use to prosecute a driver:

  • the radar reading was above 100km/h, ignoring all the sources of uncertainty?
  • their true speed was definitely above 100km/h, accounting for uncertainty in the radar?
  • their true speed might have been above 100km/h, accounting for uncertainty in the radar?
  • we can be reasonably sure their speedometer registered above 100km/h, accounting for both uncertainties?
  • their true speed was definitely above 100km/h, accounting for uncertainty in the radar and it’s likely that their speedometer registered above 100km/h, accounting for both uncertainties?

(more…)

January 21, 2012

Eggs for breakfast

Earlier in the week I complained that the Egg Foundation and the Herald were over-interpreting a lab study of mouse brain cells.  The study was a perfectly reasonable, and probably technically difficult, piece of basic biological research.  It’s the sort of research that answers the question “By what mechanisms might different foods affect brain function differently?”.   It doesn’t answer the question “What’s for breakfast?”.

If you wanted to know whether a high-protein breakfast such as eggs really increases alertness there are at least two ways to set up a relevant study.    The first would be an open-label randomized comparison of eggs and something else; the second would be a double-blind study of high-protein and high-carbohydrate versions of the same breakfast.  In both cases, you recruit people and randomly allocate them to higher-protein breakfasts on some days and lower-protein on other days.

In an open-label study you have to be careful to minimise response bias, so you would tell participants, truthfully, that some people think protein for breakfast is better and others think complex carbohydrates are better.  You would have to be careful not to indicate what you believed,  and it would be a good idea to measure some addition information beyond alertness, such as hunger, what people ended up eating for lunch.   There’s always some potential for bias, and one strategy is to ask participants about something that you don’t expect to be affected, like headaches.  This strategy was used in the home heating randomized trial that underlies the government’s ‘warm home’ advertising, which found that asthma was reduced by better heating, but twisted ankles were not.

In a blinded version of the study, you might recruit muesli eaters and, perhaps with the help of a cereal manufacturer, randomize them to higher-protein and lower-protein versions of breakfast.  This would be a bit more expensive, but perfectly feasible.  There would be less risk of reporting bias, since neither the participant nor the people recording the data would know whether the meals were higher or lower in protein on a particular day.  At the end of the study, you unmask the breakfasts and compare alertness.   The main disadvantage of this approach is the same as its main advantage — you learn about higher-protein vs lower-protein muesli, and have to make some assumptions to generalize this to eggs vs cereal or toast.

If it really mattered whether eggs for breakfast increased alertness, these studies would be worth doing.  But the Egg Foundation is unlikely to be interested, since it wouldn’t benefit from knowing the facts.  The mouse brain study is enough of a fig-leaf to let the claim stand up in public, and they don’t want to risk finding out that it doesn’t have any clothes.

 

December 16, 2011

Freakonomics: what went wrong

Andrew Gelman and Kaiser Fung have an article in American Scientist

As the authors of statistics-themed books for general audiences, we can attest that Levitt and Dubner’s success is not easily attained. And as teachers of statistics, we recognize the challenge of creating interest in the subject without resorting to clichéd examples such as baseball averages, movie grosses and political polls. The other side of this challenge, though, is presenting ideas in interesting ways without oversimplifying them or misleading readers. We and others have noted a discouraging tendency in the Freakonomics body of work to present speculative or even erroneous claims with an air of certainty. Considering such problems yields useful lessons for those who wish to popularize statistical ideas.

October 20, 2011

The use of Bayes’ Theorem in jeopardy in the United Kingdom?

A number of my colleagues have sent me this link from British newspaper The Guardian, and asked me to comment. In some sense I have done this. I am a signatory to an editorial published in the journal Science and Justice which protests the law lords’ ruling.

The Guardian article refers to a Court of Appeal ruling in the United Kingdom referred to as R v T. The original charge against Mr. T. is that of murder and, given the successful appeal, his name is suppressed. The nature of the appeal relates to whether an expert is permitted to use likelihood ratios in provision of evaluative opinion, whether an evaluative opinion based on an expert’s experience is permissible, and whether it is necessary for an expert to set out in a report the factors on which evaluative opinion based.

It is worthwhile noting before we proceed that to judge a case solely on one aspect of the whole trial is dangerous. Most trials are complex affairs with many pieces of evidence, and much more testimony that the small aspects we concentrate on here.

The issue of concern to members of the forensic community is the following part of the ruling:

In the light of the strong criticism by this court in the 1990s of using Bayes theorem before the jury in cases where there was no reliable statistical evidence, the practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice.

The practice of using likelihood ratios was justified as producing “balance, logic, robustness and transparency”, as we have set out at [54]. In our view, their use in this case was plainly not transparent. Although it was Mr Ryder’s evidence (which we accept), that he arrived at his opinion through experience, it would be difficult to see how an opinion of footwear marks arrived at through the application of a formula could be described as “logical”, or “balanced” or “robust”, when the data are as uncertain as we have set out and could produce such different results.

A Bayesian, or likelihood ratio (LR) approach to evidence interpretation, is a mathematical embodiment of three principles of evidence interpretation given by Ian Evett and Bruce Weir in their book Interpreting DNA Evidence: Statistical Genetics for Forensic Scientist. Sinauer, Sunderland, MA 1998. These principles are

  1. To evaluate the uncertainty of any given proposition it is necessary to consider at least one alternative proposition
  2. Scientific interpretation is based on questions of the kind “What is the probability of the evidence given the proposition?”
  3. Scientific interpretation is conditioned not only by the competing propositions, but also by the framework of circumstances within which they are to be evaluated

The likelihood ratio is the central part of the odds form of Bayes’ Theorem. That is
Bayes' Theorem

The likelihood ratio gives the ratio of the probability of the evidence given the prosecution hypothesis to the probability of the evidence given the defense hypothesis. It is favoured by members of my community because it allows the expert to comment solely on the evidence, which is all the court has asked her or him to do.

The basis for the appeal in R v T was that the forensic scientist, Mr Ryder, in the first instance computed a likelihood ratio, but did not explicitly tell the court he had done so. In the second instance, there was also criticism that the data needed to evaluate the LR was not available.

Mr Ryder considered four factors in his evaluation of the evidence. These were the pattern, the size, the wear and the damage.

The sole pattern is usually the most obvious feature of a shoe mark or impression. Patterns are generally distinct between manufacturers and to a lesser extent between different shoes that a manufacturer makes. Mr Ryder considered the probability of the evidence (the fact that the shoe impression “matches” the impression left by the defendant’s shoe) if it indeed was his shoe that left it. It is reasonable to assume that this probability is one or close to one. If the defendant’s shoe did not leave the mark, then we need a way of evaluating the probability of a “adventitious” match. That is, what’s the chance that the defendant’s shoe just happened to match by sheer bad luck alone? A reasonable estimate of this probability is the frequency of the pattern in the relevant population. Mr Ryder used a database of shoe pattern impressions found at crime scenes. Given that this mark was found at a crime scene this seems a reasonable population to consider. In this database the pattern was very common with a frequency of 0.2. The defense made much stock of the fact that the database represented only a tiny fraction of the shoes produced in the UK in a year (0.00006 per cent), and therefore it was not comprehensive enough to make the evaluation. In fact, the defense had done its own calculation which was much more damning for their client. Using the 0.2 frequency gives a LR of 5. That is, the evidence is 5 times more likely if Mr T.’s shoe left the mark rather than a shoe of a random member of the population.

The shoe size is also a commonly used feature in footwear examination. The shoe impression was judged to be size 11. Again the probability of the evidence if Mr T.’s shoe left the mark was judged to be one. It is hard to work out exactly what Mr Ryder did from the ruling, because a ruling is the judges’ recollection of proceedings, which is not actually an accurate record of what may, or may not, have been said. According to the ruling, Mr Ryder used a different database to assess the frequency of size. He estimated this to be 3%. The judges incorrectly equate this to 0.333, instead of 0.03 which would lead to an LR of 33.3. Mr Ryder used a “more conservative” figure to reflect to some uncertainty in size determination to 0.1, giving an LR of 10.

Wear on shoes can be different between different people. Take a look at the soles of your shoes and those of a friend. They will probably be different. To evaluate the LR, Mr Ryder considered that the wear on the trainers. He felt could exclude half of the trainers of this pattern type and approximate size/configuration. He therefore calculated the likelihood ratio for wear as 1/0.5 or 2. Note here that Mr Ryder appears to have calculated the probability of wear given pattern and size.

Finally, Mr Ryder considered the damage to the shoes. Little nicks and cuts accumulate on shoes over time and can be quite distinctive. Mr Ryder felt he could exclude very few pairs of shoes that could not previously have been excluded by the other factors. That is the defendant’s shoes were no more, or less, likely to have left the mark than any other pair in the database that had the same pattern, size and wear features. Therefore therefore calculated the likelihood ratio for damage as 1.

The overall LR was calculated by multiplying the four LRs together. This is acceptable if either the features were independent, or the appropriate conditional probabilities were considered. This multiplication gave an LR of 100, and that figure was converted using a “verbal scale” into the statement “the evidence provides moderate support for the proposition that the defendant’s shoe left the mark.” Verbal scales are used by many forensic agencies who employ an LR approach because they are “more easily understood” by the jury and the court.

The appeal judges ruled that this statement, without the explicit inclusion of information explaining that it was based on an LR, was misleading. Furthermore, they ruled that the data used to calculate the LR was insufficient. I, and many of my colleagues, disagree with this conclusion.

So what are the consequences of this ruling? It remains to be seen. In the first instance I think it will be an opening shot for many defense cases in the same way that they try to take down the LR because it is “based on biased Bayesian reasoning.” I do think that it will force forensic agencies to be more open about their calculations, but I might add that Mr Ryder didn’t seek to conceal anything from the court. He was simply following the guidelines set out by the Association of Footwear, Tool marks, and Firearms Examiners guidelines.

It would be very foolish of the courts to dismiss the Bayesian approach. After all, Bayes’ Theorem simply says (in mathematical notation) that you should update your belief about the hypotheses based on the evidence. No judge would argue that against that.

October 3, 2011

Noted for the record

One of the reasons statistics is difficult is the ‘availability heuristic’. That is, we estimate probabilities based on things we can remember, and it’s a lot easier to remember dramatic events than boring ones.  It’s not just that correlation doesn’t imply causation; our perception of correlation doesn’t even imply correlation.

To help with availability, I’d like to make two boring and predictable observations about recent events.

1.  This winter, despite the Icy Polar Blast™, was slightly warmer than the historical average, as forecast.

2. There wasn’t a major earthquake in ChCh in the last week of September, despite the position of the moon or the alignment of Uranus (or anything else round and irrelevant).

July 22, 2011

Breastfeeding and the risk of SIDS

Stats.org has published an excellent article on the research surrounding risk factors for Sudden Infant Death Syndrome (SIDS), in particular breastfeeding:

Let there be no doubt: not breastfeeding and SIDS are correlated. The problem is that breastfeeding is correlated with many other factors as well, any of which could be the “cause” (or “causes”) behind an increased SIDS rate among people who use formula instead of mothers’ milk. These include a variety of social and cultural differences, differences in care, differences in other feeding patterns, differences in sleeping patterns, differences in genetic makeup, differences in home environment, differences in medical care, etc. The question is whether the evidence points to breastfeeding (or mother’s milk) as a preventative factor by itself and independent of all the other factors with which breastfeeding tends to go hand in hand.

It continues to unravel the statistics and evidence and concludes by saying:

Without the science, the claims of cost due to not breastfeeding – 447 babies and almost 5 billion dollars in economic loss — are like an empty bottle: wanting for real substance.

Read more »

June 28, 2011

Interesting read

Here is an interesting blog post on the heated debate about the link between soda consumption and obesity in the US.

June 19, 2011

The misuse of DNA statistics

From the NZ Herald:

CIA personnel there compared it “with a comprehensive DNA profile derived from DNA collected from multiple members of bin Laden’s family,” the statement said. “The possibility of a mistaken identification is approximately one in 11.8 quadrillion.”

This is a common misreporting of DNA statistics and it highlights the confusion regarding evidence interpretation. The figure, 1 in 11.8 quadrillion, quoted in the CIA statement is known as a random match probability. It answers a specific question. In this case the question is, “What is the probability someone else has this profile, given what we know about the alleged victim’s (bin Laden) DNA profile, and the profiles of his extended family?” Note that this is a very different question from what is the probability that this DNA comes from someone other than Mr bin Laden?”

This is a very common mistake, so common in fact that it has a name, the Prosecutor’s fallacy. The fallacy usually relates to a misunderstanding regarding conditional probability.

In this case it is far more likely that the DNA analyst calculated a likelihood ratio. The likelihood ratio compares the probability of the evidence under two competing hypotheses. In this case sensible hypotheses might be, Hp: the body is Mr bin Laden and Hd: the body is someone unrelated to Mr bin Laden. The correct statement would be “The (DNA) evidence is 11.8 quadrillion times more likely if the body is Mr Bin Laden rather than if the body belongs to someone other who is unrelated to Mr bin Laden.” This is a statement about the evidence not about the hypotheses.

It is possible to give a statement regarding the hypotheses, but in order to do this we have to have some prior probabilities associated with them before we consider the evidence. The statistical formula that allows us to reverse the probability statements is known as Bayes’ Theorem.

Do I think the body belongs to someone other than Mr bin Laden? No, but I do think there is an obligation to use statistics correctly.