Posts filed under Probability (66)

August 22, 2012

Non-awful lottery story

Stuff has a story on today’s Big Wednesday lotto that doesn’t say anything obviously untrue or misleading.  I’m sure this isn’t a first, but it is rare enough to be notable.

The story doesn’t say anything about the odds, but in a sense that’s not the point: you don’t play the lottery to win, you play it to imagine winning.  To quote another statistician

The benefit to playing the lottery comes entirely between buying the ticket, and when the winner is revealed. During this interval, someone who has bought the ticket can entertain the idea that they might win, and pleasantly imagine how much better their life could be with the money, what they would do with it, etc. … If a $1 lottery ticket licenses even one hour of imagining a different life, I don’t see how people who spend $12 for two or three hours of such imagining at a movie theater, or $25 for ten hours at a bookstore, are in any position to talk.

 

August 21, 2012

Queueing theory and practice

There’s an interesting story in the New York Times about queueing, a subject dear to the hearts of some of my colleagues.  In one sense queueing is a topic in probability theory, where you work out how long people might have to wait under various circumstances, leading to surprising but useful techniques such as metered on-ramps to motorways.  But it’s also a topic in applied psychology: if you get off your plane ten minutes before your bags arrive at the carousel, you’ll notice the wait less if you spend most of it walking. So that’s what the airports make you do.

August 16, 2012

Probabilistic weather forecasts

For the Olympics, the British Meterology Office was producing animated probabilistic forecast maps, showing the estimated probability of various amounts of rain or strengths of wind at a fine grid of locations over Britain.  These are a great improvement over the usual much more vague and holistic predictions, and they were made possible by a new and experimental high-resolution ensemble forecasting system.  (via)

I will quibble slightly about the probabilities in the forecast, though.  The Met Office generates a set of predictions spanning a reasonable range of weather models and input uncertainties, and then says “80% change of rain” if 80% of the predictions have rain at that location.   That is, 80% means an 80% chance that a randomly chosen prediction will say “rain”, it doesn’t necessarily mean that “out of locations and hours with 80% forecast probability, 80% of them will actually get rain”.

It’s possible to improve the calibration of the probabilities by feeding the ensemble of predictions into a statistical model, and researchers at the University of  Washington have been working on this.  Their ProbCast page gives probabilistic rain and temperature forecasts for the state of Washington that are based on a statistical model for the relationship between actual weather and the ensemble of forecasts, and this does give more accurate uncertainty numbers.

August 2, 2012

Is Lotto rigged?

Stuff seems to think so (via a Stat of the Week nomination)

If you plan to celebrate 25 years of Lotto with a ticket for tonight’s Big Wednesday draw, some outlets offer better odds than others.

No, they don’t.  They offer the same odds, roughly zero. Some outlets have sold more winning tickets than others in the past, that’s all.

Many people, even statisticians, enjoy playing Lotto. If you want to buy tickets at places where other people have won in the past, there’s nothing wrong with that and it won’t hurt your chances.   Since many people enjoy playing, it makes some sense for newspapers to write about Lotto from time to time.  But there’s no excuse for leading with a blatantly untrue statement.

 

July 23, 2012

Road toll stable

From the Herald this morning

More people have died in fewer car smashes since January 1 than at this time last year, prompting a Government reminder about the responsibility drivers hold over others’ lives.

“The message for drivers is clear,” Associate Transport Minister Simon Bridges said yesterday of a spate of multi-fatality crashes that have boosted the road toll to 161.

The number of fatal crashes is 133, compared to 144 last year at this time, and the number of deaths is 161, compared to 155 last year.

How do we calculate how much random variation would be expected in counts such as these?  It’s not sampling error in the sense of opinion polls, since these really are all the crashes in New Zealand.  We need a mathematical model for how much the numbers would vary if nothing much had changed.

The simplest mathematical model for counts is the Poisson process.  If dying in a car crash is independent for any two people in NZ, and the chance is small for any person (but not necessarily the same for different people) then number of deaths over any specified time period will follow a Poisson distribution.    The model cannot be exactly right — multiple fatalities would be much rarer if it were — but it is a good approximation, and any more detailed model would lead to more random variation in the road toll than the Poisson process does.

There’s a simple trick to calculate a 95% confidence interval for a Poisson distribution, analogous to the margin of error in opinion polls.  Take the square root of the count, add and subtract 1 to get upper and lower bounds, and square them: a count of 144  is consistent with underlying averages rates from 121 to 169.   And, as with opinion polls, when you look at differences between two years the range of random variation is about 1.4 times larger.

Last year we had an unusually low road toll, well below what could be attributed to random variation.  It still isn’t clear why, not that anyone’s complaining.  The numbers this year look about as different from last year’s as you would expect purely by chance.  If the message for drivers is clear, it’s only because the basic message is always the same:

yellow road sign: You're in a box on wheels hurtling along several times faster than evolution could have prepared you to go

July 9, 2012

Book review: Thinking, Fast and Slow

Daniel Kahneman and Amos Tversky made huge contributions to our understanding of why we are so bad at prediction.  Kahneman won a Nobel Prize[*] for this in 2002 (Tversky failed to satisfy the secondary requirement of still being alive).  Kahneman has now written a book, Thinking, Fast and Slow about their research.  Unlike some of his previous writing, this book is designed to be shelved in the Business/Management section of bookshops and read by people who might otherwise be  looking for their cheese.

The “Fast” and “Slow” of the title are two systems of thought: the rapid preconscious judgement that we use for most of our decision-making, and the conscious and deliberate evaluation of alternatives and probabilities that we like to believe we use.   The “Fast” system relies very heavily on stereotyping — finding the best match for a situation in a library of stories — and so is subject to predictable and exploitable biases.  The “Slow” system can be trained to do much better, but only if we can force it to be used.

A dramatic example of the sort of mischief the “fast” system can get up to is anchoring bias.  Suppose you ask a bunch of people how many UN-member countries are in Africa.  You will get a range of guesses, probably not very accurate, and perhaps a few people who actually know the answer.  Suppose you had first asked people to write down the last two digits of their telephone number, or to spin a roulette wheel and write down the number that is chosen, and then to guess how many countries there are in Africa.  Empirically, across a range of situations like this, there is a strong correlation between the obviously irrelevant first number and the guess.   This is an outrageous finding, but it is very well confirmed.   It’s one of the reasons that bogus polls are harmful even if you know they are bogus.

Kahneman gives many other examples of cognitive illusions generated by the ‘fast’ system of the mind.  As with optical illusions, they don’t lose their intuitive force when you understand them, but you can learn not to trust your intuition in situations where it’s going to be biased.

One minor omission of the book is that there’s not much explanation of why we are so stupid: Kahneman points out, and documents, that thinking uses up blood sugar and is biologically expensive, but that doesn’t explain why the mistakes we make are so simple.  Research in computer science and philosophy, by people actually trying to implement thinking, gives one possibility, under the general name of “the frame problem“.  We know an enormous number of facts and relationships between them, and we cannot afford to investigate the logical consequences of all these facts when trying to make a decision.  The price of tea in China really is irrelevant to most decisions, but not to decisions about tea purchases, or about souvenir purchases when in Beijing, or to living-wage levels in Fujian.  We need some way of ignoring the price of tea in China, and millions of other facts, except very occasionally when they are relevant, without having to deduce their irrelevance each time.  Not surprisingly, it sometimes misfires and treats information as important when it is actually irrelevant.

Read this book.  It might help you think better, and at least will give you better excuses for your mistakes.

 

* to quote Daniel Davies: “blah blah blah Sveriges Riksbank. Nobody cares, you know.”

July 4, 2012

Physicists using statistics

Traditionally, physics was one of the disciplines whose attitude was “If you need statistics, you should have designed a better experiment”.  If you look at the CERN webcast about the Higgs Boson, though, you see that it’s full of statistics: improved multivariate signal processing, boosted decision trees, random variations in the background, etc, etc.

Increasingly, physicists have found, like molecular biologists before them, and physicians before that, that sometimes you can’t afford to do a better experiment. When your experiment costs billions of dollars, you really have to extract the maximum possible information from your data.

As you have probably heard by now, CERN is reporting that they have basically found the Higgs boson: the excess production of certain sets of particles deviates from a non-Higgs model by 5 times the statistical uncertainty: 5σ.  Unfortunately, a few other sets of particles don’t quite match, so combining all the data they have 4.9σ, just below their preferred threshold.

So what does that mean?  Any decision procedure requires some threshold for making a decision.  For drug approval in the US, you need two trials that each show the drug is more effective than placebo by twice the statistical uncertainty: ie, two replications of 2σ, which works out to be a combined exceedance by 2.8 times the statistical uncertainty: 2.8σ.  This threshold is based on a tradeoff between the risk of missing a treatment that could be useful and the risk of approving a useless drug.  In the context of drug development this works well — drugs get withdrawn from the market for safety, or because the effect on a biological marker doesn’t translate into an effect on actual health, but it’s very unusual for a drug to be approved when it just doesn’t work.

In the case of particle physics, false positives could influence research for many years, so once you’ve gone to the expense of building the Large Hadron Collider, you might as well be really sure of the results.  Particle physics uses a 5σ threshold, which means that in the absence of any signal they have only a 1 in 30 million chance per analysis of deciding they have found a Higgs boson.    Despite what some of the media says, that’s not quite the same as a 1 in 30 million chance of being wrong: if nature hasn’t provided us with  a 125GeV Higgs Boson, an analysis that finds the result has a 100% chance of being wrong, if there is one, it has a 0% chance of being wrong.

 

June 21, 2012

Why Nigeria?

A fairly large fraction of the spam advertising the chance to capitalise on stolen riches is sent by Nigerian criminals.  What should be more surprising is the fact that a lot of the spam actually says it’s from Nigeria (or other West African nations).  Since everyone knows about Nigerian scams, why don’t the spammers claim to be from somewhere else? It’s not as if they have an aversion to lying about other things.

A new paper from Cormac Herley at Microsoft Research has a statistically-interesting explanation:  the largest cost in spam operations is in dealing with the people who respond to the first email.  Some of these people later realise what’s going on and drop out without paying; from the spammer’s point of view these are false positives — they cost time and money to handle, but don’t end up paying off.   A spammer ideally wants only to engage with the most gullible potential victims; the fact that ‘Nigeria’ will spark suspicions in many people is actually a feature, not a bug.

 

March 30, 2012

Powerball and the Kelly criterion

A popular decision rule for investment and other forms of gambling is the Kelly criterion, named after mathematician (and successful investor) John Kelly.  In the long run, following this rule will maximise long-run expected wealth.

If we assume that the story in Stuff about Wairarapa bettors having had a 2:1 return in the past year can be applied to tomorrow’s Powerball (which it can’t), we can look at what that would imply about rational betting.

The Kelly criterion specifies what fraction of your total wealth you should spend on an investment opportunity.  The fraction is always less than your probability of winning.  With 2:1 expected payoff and large odds, the recommended fraction is about half the probability of winning

The chance of the top Powerball prize (since this isn’t a ‘must win’ week) is 1 in 38 million for a $1 bet, so you should bet less than 1 dollar for each 76 million dollars of your current disposable wealth.   For most of us, that’s less than one dollar.

It’s worth noting that while not everyone supports the Kelly criterion, most of the critics suggest that you should bet less than the criterion recommends, not more.

(via a commenter at Cornell physics blog The Virtuosi)

January 26, 2012

Unfaithful to the data, too.

When I were young, the Serious News Outlets  probably wouldn’t have admitted the existence of extra-marital affairs by non-celebrities, let alone written an article that’s basically advertising from an infidelity website press release.

In some ways the data are better-quality than most advertorials, because the website has complete data on its NZ members.  They have even gone as far as using population sizes for NZ cities to estimate their, um, market penetration, which varied across the five main cities by as much as 0.06%.  No, that doesn’t exceed the margin of error.

The Herald’s article starts off

If your partner supports National, has a PC, drinks Coke, eats meat, has a tattoo, smokes and is a Christian, be warned – they could be a cheater.

Leaving aside the gaping logical chasm in identifying website members as representative of all ‘cheaters’, what the data actually say is that more members support National, not that more National supporters are members.   As you may recall, we determined not so long ago that more New Zealanders of all descriptions support National than any other party, so that’s what you would expect for members of the website.   The proportion of National supporters in the election was 47%, among website members it’s 33%, so National supporters are substantially less likely to be members of the website than supporters of other parties. The proportion identifying as Christian among website members is very similar to the proportion in the 2006 census.   79% of website users are on PC (vs Mac).  Again that’s a lower proportion of PCs than in the population of NZ computers (the Herald said 10% were Macs in July 2010, and for Aus+NZ combined, IDC now says 15%) but one explanation is that Macs have more of the home market than the business market.  More members drinking Coke vs Pepsi is also not surprising — I couldn’t find population figures, but Coke dominates the NZ cola market.

The story doesn’t say, but we can also be pretty confident that the website members are more likely to be Pakeha than Maori, more likely to be accountants than statisticians, and more likely to have a pet cat than a pet camel.