Posts written by Thomas Lumley (2534)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

May 21, 2021

Briefly

  • 92% of people think they’re better than average at complying with Covid-19 rules (UK study).  It could be that people who comply with Covid rules are also more likely to respond to surveys, but it also wouldn’t be surprising if most people just think they’re better than average.
  • A (flashing) map of lighthouses
  • “Did 4 per cent of Americans really drink bleach last year?” — a good piece at the NZ$ Herald, from the Harvard Business Review. Spoiler: no, probably not: it’s the sort of claim that surveys are bad at supporting.
  • “Every recorded battle in history” on a map (click to embiggen).

    “recorded” means recorded by Wikipedia. The European bias is pretty clear, but note that it even misses battles in the Waikato and Taranaki wars where British soldiers were awarded the Victoria Cross, so the Wikipedia completeness bias is more complicated than just ‘European soldiers’.

But they’re all good dogs

Q: Did you see dogs are better than lateral flow tests for detecting Covid?

A: No

Q: The Guardian says: Dogs can better detect Covid in humans than lateral flow tests, finds study. With 97% accuracy!!

A: That’s the detection rate in people with Covid. The detection rate of Covid in people without Covid is 9%, which is a lot higher than you’d like.

Q: Where did they find the people?

A: It was people turning up to a testing centre in Paris, though since 109 out of 335 had Covid, it can’t really have been representative. The test positivity rate in France as a whole is only 4.5% at the moment and peaked at 15.9%

Q: Is 335 people enough?

A: Potentially, though the study initially planned to get 2000 people

Q: The story says lateral flow tests correctly identify on average 72% of people infected with the virus who have symptoms, and 58% who do not. That sounds really bad. Why does anyone use them?

A:  There’s a lot of variability between tests: some of them are better.  Also, they have much, much lower false positive rates than the dogs — around half a percent.  Since there’s a tradeoff between conservative (giving false negatives) and being sensitive (giving false positives), you can’t just compare the sensitivity of two tests that have a ten-fold difference in false positive rate.

Q: Still dogs would be quicker, and you could just use the real test in people the dogs picked out

A: That has potential, but dogs don’t scale all that well. You need to train and test each dog; they can’t just be mass-produced, boxed up, and mailed around the country.  And dogs aren’t that much quicker — this isn’t walk-past sniffing like the beagles looking for smuggled food at Auckland airport; you need to stick a pad in your armpit for a couple of minutes.

Q: How much of the spin is from the research paper and how much is coming from the newspaper.

A: The researchers are reported in the French source: “Ces résultats confirment scientifiquement la capacité des chiens à détecter une signature olfactive de la Covid-19“, souligne l’AP-HP, précisant que cette étude, pas encore publiée dans une revue médicale, est “la première de ce type réalisée au niveau international“.

Q: J’ai pas de clue que that means

A:  “These results scientifically confirm the ability of dogs to detect an olfactory signature of Covid-19 “, emphasise [the hospital], specifying that this study, not yet published in a medical journal, is ” the first of this type carried out in international level “.

Q: So it’s not just the Guardian

A: No.

May 7, 2021

Mind if we call you ‘Bruce’?

From news.com.au (via @LewSOS and @Economissive on Twitter) “The names of Australians most likely to win the lotto have been revealed, with the top three taking home more than a quarter of the prizes last year.”

What they actually have for ‘names’ is first initials. Apparently, more than a quarter of first-division prizes last year were won by people whose names started with “J”, “A”, or “D”.  Of course, people whose names start with these letters are not any more likely to win lotto if they buy tickets. Either more people in this category bought tickets than average (in which case it would be truer to say they are more likely to lose Lotto), or the distribution of initials is pretty much the same as for the country as a whole.

The story does go on to say that name and age can’t affect your chance of winning, but not to explain why, given that, it’s news.

Anyway, since Rob Hyndman and the stats group at Monash have put together a database of frequencies of Australian names, we can see how representative the winners are.  Here are the proportion of Oz babies born each year (up to current 18-year-olds) whose names begin with “J”, “A”, or “D”. As you can see, it’s “more than a quarter” almost every year where we have data.

Since you’re a StatsChat reader, you can probably think of reasons there might be a difference between Lotto name frequencies and baby name frequencies.  The baby names don’t include immigrants and do include emigrants.  There might be ethnic differences in propensity to play Lotto that happen to be correlated with first initial. There might be quite large chance differences because the lottery folks only looked at first-division winners, a very small (but random) sample of Lotto players. But it doesn’t look like we need to go there.

May 3, 2021

Briefly

  • Mediawatch took on bogus polls and interviewed me.
  • If you ask people whether the recent stories about blood clotting affect their views on Covid vaccines they will say ‘yes’. But if you asked people, before and after, what their views actually are, there’s a very slight change to be more pro-vaccine.
  • A study of 9806 blood donors found 8 had antibodies to the Covid virus.  Extrapolated crudely to all of NZ that’s about 4000 undiagnosed cases. I’m not giving a statistical uncertainty interval because the non-sampling uncertainty is going to be larger — blood donors tend to be younger and so more likely to have been asymptomatic/weakly symptomatic and thus not tested, but might also be less likely to have been infected.  Still, the number is in the sort of range you’d expect.
April 30, 2021

Physical punishment of children – reporting

There’s a new research paper out from the Christchurch Health and Development Study, which recruited a group of people when they were born, in mid-1977, and has been following them ever since.  Those of the participants who were parents have been asked about physical discipline of their children on four occasions: when they were 25, 30, 35 and 40.  Obviously, over time, the number who are parents has increased (from about 150 to over 600), and the children have (on average) gotten older — when the parents were 25, most of the kids would have been pre-school; the group now includes a few very young children but many who are teenagers.

The good feature of birth cohorts like the Christchurch study is that you get to see the same people throughout the course of their lives; the bad feature is that at any given time everyone is exactly the same age.  In statistician jargon, age is completely confounded with period: you are completely unable to distinguish effects of ageing from time trends. When you see that the proportion of parents reporting hitting their kids has gone down from 77% to 42% over the 15 years, you can’t tell, at all, whether this is an effect of these specific parents getting older and more experienced or an effect of parents in general being less likely to hit their kids.  It’s hard (though not impossible) even to tell if it’s an effect of the kids being older.

As you’d hope, the research paper, in the NZ Medical Journal (paywalled) is very clear on this

…explanations include: increasing maturity of the parenting sample over time (less reactive, more experienced, older parents); a cultural shift towards the unacceptability of violence towards children over the period of the study; and the law change in 2007, which prohibited physical punishment and violence towards children. Given the nature of its design, it is not possible for the current study to distinguish between these explanations.However, it does not seem unreasonable to conjecture that all three processes are likely to have played a role.

And indeed it doesn’t seem unreasonable, as long as you recognise that the not seeming unreasonable isn’t a conclusion from the data and relies entirely on external plausibility.  The researchers do conclude that there’s still a lot of physical punishment going on, and that efforts are needed to stop it; the former is well-supported by the data and the latter is a policy response, not a scientific conclusion. That’s all good.

So let’s look at the reporting (some of this may have changed before or after I read it, of course)

  • Radio NZ: Number of parents smacking children drops by half in 15 years. No caveat about the study design meaning this conclusion is basically unsupported. Gets the journal name wrong.
  • 1 News: More than 40% of parents still use physical discipline years after law change, latest data shows. The story is better than the headline, and the Children’s Commissioner is quoted as saying “It’s representative of one cohort born in 1977, one group in one year in one generation, but there has been a discernible drop over the years.” I’d be happier if it was clearer from the beginning that this doesn’t claim to be representative of NZ in general over time.
  • NZ Herald. Parents’ physical punishment of children decreasing, but still common – report. Slightly better headline; much clearer in the story. “…the rate of physical punishment against children was higher when parents were younger, and then decreased with age… because of the way the study was designed, it couldn’t pinpoint how much the rates reduced because of the law change.”
  • Otago Daily Times. Parents still smacking, study finds. Good. “The authors warned that their method of studying a cohort of people over time meant they could not gauge what the attitude of new parents in 2021 might be to physical punishment. However, the research did suggest rates of smacking or hitting children were high enough to be a public health concern.”
  • Stuff. Physical punishment of children still ‘fairly common’, despite anti-smacking law change – study.  There’s no caveat about the study design, and the story says “New research, published in the New Zealand Medical Journal on Friday, examined how the prevalence of child physical punishment changed in the 15-year period between 2002 and 2017 – before and after the legislation came into force.”, which isn’t true. And that’s not a link to the research paper.
  • Newshub. Who’s most likely to use physical discipline against their kids revealed. The headline’s a bit dodgy given the non-representative group of parents, but the caveats are good “Because the study followed a cohort of parents who aged 15 years over the course of the study, “it is unclear what rates of physical punishment of children would be in studies of contemporary young parents”.

 

April 21, 2021

Knowing what to leave out

The epidemic modelling group at Te Pūnaha Matatini (who work a few floors above me) won the Prime Minister’s Science Prize for their work on modelling the Covid epidemic in New Zealand.   There have been some descriptions in the media of their models, but not so much of what it is that mathematical modelling involves.

A good mathematical model captures some aspect of the way the real process works, but leaves out enough of the detail that it’s feasible to study and learn about the model more easily.  The limits to detail might be data available or computer time or mathematical complexity or just not understanding part the way the process works.  Weather models, for example, have improved over the years by using more powerful computers and more detailed input data, enabling them to take into account more features of the real weather system and more details of Earth’s actual geography.

The simplest epidemic models are the SIR and SEIR families.  These generate the familiar epidemic curves that we’ve all seen so often: exponential on the way up, then falling away more slowly. They are also responsible for the reproduction number “R”, the average number of people each case infects.  The simple models have no randomness in them, and they know nothing about the New Zealand population except its size.  There’s a rate at which cases come into contact with new people, and a rate at which contacts lead to new infections, and that’s all the model knows.  These models are described by simple differential equations; they can be projected into the future very easily, and the unknown rates can be estimated from data.   If you want a quick estimate of how many people are likely to  be in hospital at the epidemic peak, and how soon, you can run this model and gaze in horror at the output.  In fact, many of the properties of the epidemic curve can be worked out just by straightforward maths, without requiring sophisticated computer simulation.  The SEIR models, however, are completely unable to model Covid elimination — they represent the epidemic by continuously varying quantities, not whole numbers with uncertainty.  If you put a lockdown on and then take it off, the SEIR model will always think there’s some tiny fraction of a case lurking somewhere to act as a seed for a new wave.  In fact, there’s a notorious example of a mathematical model for rabies elimination in the UK that predicted a new rabies wave from a modelled remnant of 10-18 infected foxes — a billion billionth of a fox, or one ‘attofox’.

The next step is models that treat people not precisely as individuals but at least as whole units, and acknowledge the randomness in the number of new infections for each existing case.  These models let you estimate how confident you are about elimination, since it’s not feasible to do enough community testing to prove elimination that way.   After elimination, these models also let you estimate how big a border incursion is likely to be by the time it’s detected, and how this depends on testing strategy, on vaccination, and on properties of new viral variants.  As a price, the models take more computer time and require more information — not just the average number of people infected by each case, but the way this number varies.

None of the models so far capture anything about how people in different parts of New Zealand are different.  In some areas, people travel further to work or school, or for leisure. In some areas people live in large households; in others, small households. In some areas a lot of people work at the border; in others, very few do.  Decisions about local vs regional lockdowns need a model that knows how many people travel outside their local area, and to where.  A model with this sort of information can also inform vaccination policy: vaccinating border works will prevent them getting sick, but what will it do to the range of plausible outbreaks in the future?  Models with this level of detail require a huge amount of data on the whole country, and serious computing resources; getting them designed and programmed correctly is also a major software effort.  The model has an entire imaginary New Zealand population wandering around inside the computer; you’re all individuals!

A mathematical modelling effort on this scale involves working from both ends on the problem: what is the simplest model that will inform the policy question, and what is the most detailed model you have the time and resources and expertise to implement?  Usually, it also involves a more organised approach to funding and job security and so on, but this was an emergency.  As the Education Act points out, one reason we have universities is as a repository of knowledge and expertise; when we need the expertise, we tend to need it right now.

April 14, 2021

Why the concern about vaccine blood clotting?

The AstraZeneca vaccine causes an unusual blood clotting syndrome in about 10 out of a million recipients, and it’s not entirely clear whether the J&J vaccine also does and at what frequency.  Those are small numbers, compared to other risks. In particular,  if you’re in a country with Covid, they are small compared to the risk of getting Covid and having some serious harm as a result. So why has there been so much concern?

There are a few components to the concern, but one underlying commonality: the clotting is unexpected and poorly understood.  Patients turn up with blood clots in unusual places and a shortage of platelets (which you’d normally think of as going with not enough clotting). Some obvious treatments — a standard anticlotting drug (heparin) or a transfusion of platelets — are likely to make things worse, so doctors need to know. There isn’t a really compelling model for how the vaccine causes the problem.

If the risk is 10 in a million, taking the vaccine would still be way safer than not taking it, but a lot of the concerns prompting further urgent investigation would have been whether it’s really only 10 in a million, since we don’t understand (in any detail) what’s going on

  • have we missed a bunch of cases — remember that initially the risk was thought to be only about 1 in a million?
  • are these just the most serious cases, the tip of the iceberg, with many more milder, but still serious, cases that haven’t been noticed yet?
  • are these just the earliest-developing cases, with many more on the way?
  • is this a batch problem, with some batches of vaccine potentially having a much higher risk?
  • does the problem occur in an identifiable small group of people, who would thus be at much higher risk?

There’s been enough data and enough time now to start being confident that the answer to all these questions is ‘no’.  One might rationally prefer the mRNA vaccines, which don’t have this problem, but if you live somewhere with an active outbreak and the choice was the AZ vaccine now or the Moderna vaccine in a month or two, the clotting risk shouldn’t change your decision — and the fact that it wasn’t kept secret should be reassuring.

 

April 13, 2021

The problem with journalists?

Q: Did you see that journalists drink too much, are bad at managing emotions, and operate at a lower level than average, according to a new study?

A: That sounds a bit exaggerated

Q: It’s the headlineJournalists drink too much, are bad at managing emotions, and operate at a lower level than average, according to a new study

A: What I said

Q: But “The results showed that journalists’ brains were operating at a lower level than the average population, particularly because of dehydration and the tendency of journalists to self-medicate with alcohol, caffeine, and high-sugar foods.”

A: How did they measure brain dehydration?

Q: Don’t I get to ask the leading questions?

A:

Q: How did they measure brain dehydration?

A: They didn’t. It just means they drank less than 8 glasses of water per day, per the usual recommendations

Q: Aren’t those recommendations basically an urban mythl?

A: Yes, they seem to be

Q: How much caffeine was ‘too much’?

A: More than two cups of coffee per day

Q: Does that cause brain dehydration

A: No, not really

Q: What is the daily recommended limit for coffee anyway?

A: There really isn’t one. The European Food Safety Authority looked at this in 2015, and they said basically that four cups a day seemed pretty safe but they didn’t have a basis for giving an upper limit.

Q: There’s a limit for alcohol, though?

A: Yes, “To keep health risks from alcohol to a low level, the UK Chief Medical Officers (CMOs) advise it is safest not to drink more than 14 units a week on a regular basis.” And the journalists drank slightly more than that on average.

Q: What’s the average for non-journalists?

A: Hard to tell, but the proportion drinking more than 14 units/week is about 1 in 3 for men and about 1 in 6 for women in the UK.

Q: So, a bit higher than average but not much higher.  How about these brain things. How big were the differences?

A: The report doesn’t say — it doesn’t give data, just conclusions

Q: How much evidence is there that they are even real, not just chance?

A: The report doesn’t say, though the Business Insider story says “it is not yet peer reviewed, and the sample size is small, so the results should not be taken necessarily as fact.

Q: When will it be peer-reviewed?

A: Well, the story is from 2017 and there’s nothing on PubMed yet, so I’m not holding my breath.

April 12, 2021

Briefly

  • Henry Cooke for the Dominion Post “A routine report on the Government’s mental health services was delayed for over a year as officials battled behind the scenes over plans to dramatically reduce the amount of data in it”
  • A letter in response by Len Cook (former  NZ and UK chief statistician): “Few important agency statistics are prepared in order to comply with a law; rather, they maintain  public trust and inform practitioners in the field of progress and conditions across the populations of importance”
  • Kate Newton writes in the Sunday Star-Times about the impact of pre-departure Covid testing, “In the two-and-a-half months prior, the average (mean) case rate was 0.66 new cases per 1000 people in managed isolation and quarantine (MIQ). In the following two-and-a-half months, the daily rate has fallen – but only slightly — to 0.55.”
  • Derek Thompson in the Atlantic on vaccine misinformation “In a crowded field of wrongness, one person stands out: Alex Berenson.” 
  • Dan Bouk and danah boyd: The technopolitics of the U.S. census “Almost no one notices the processes that produce census data—unless something goes terribly wrong. Susan Leigh Star and Karen Ruhleder argue that this is a defining aspect of infrastructure: it “becomes visible upon breakdown.” In this paper, we unspool the stories of some technical disputes that have from time to time made visible the guts of the census infrastructure and consider some techniques that have been employed to maintain the illusion of a simple, certain count. “
April 1, 2021

Briefly

  • Eden Park is the world’s sexiest bald man. Or something like that.   The results are bogus for the obvious reason: Prince William and Eden Park both get a lot of internet coverage, so they will show up on what is basically a count of Google hits.  You might well get the same winners for ‘ugliest bald man’ and ‘least popular cricket ground.  These reports are typically done in order to get some company’s name in the news, and since they typically don’t provide any real information about the numbers, it would be poetic justice to report the claims but just leave out the company name.  Or, better, ignore them.
  • Good news: there are clinical trial results for the Pfizer/BioNTech vaccine in children aged 12-15. These still need review based on more detail than just a press release, but it’s quite likely that we’ll be vaccinating this age group by the time we’d get around to them based on risk.  Trials in younger children are just starting; the end date will depend on how bad the pandemic is in the next few months, but might be around the end of the year.
  • Books: