February 20, 2026

Out of warranty?

A new medical study (reported here and here) used MRI to look at the shoulders of a reasonable representative sample of 602 people over 40 in Finland.  Rotator cuff abnormalities of varying apparent severity were seen in 595 of the people: that’s 99% to two digits accuracy.  Of the 602 people, 18% reported shoulder pain and the other 82% claimed their shoulders were ok (apart from being over 40).

There wasn’t much difference between the people who noticed their shoulders hurting and the ones who didn’t: here’s a graph comparing asymptomatic and symptomatic shoulders,  so someone with one bad shoulder and one over-40-but-otherwise-good shoulder is in both groups. The green at the bottom is “no abnormality”, then we progress through “tendinopathy”, “partial-thickness tear”,”full-thickness tear”.

You can see the abnormalities are a bit more severe in the symptomatic group, but not enough to make a useful diagnostic test.  On top of that, the researchers showed that the difference largely goes away when you adjust for things a doctor would have measured before doing the MRI, so the MRI really isn’t providing much useful information.

We’ve seen this before. New medical-imaging tech gets used first on people who look like they need it. A lot of people with back pain were given CT scans. These showed that people with back pain had weird misshapen spines, and often led to referrals for surgery.  It was much later that people not reporting significant back pain had their backs scanned — and they, too, often had weird misshapen spines. Spines are just badly designed and implemented.

Medical imaging can be immensely valuable: simple X-rays, CT scans, MRI, PET, and so on. One of Marie Curie’s many claims to fame was designing, deploying, and driving mobile X-ray units in the Battle of the Marne.  But with each shiny new technology for subtler and more precise imaging there’s an increasing need for control data. Marie Curie could see a piece of lead in a soldier’s heart and be confident that it wasn’t normal.  The problems we’re looking for in shoulders and spines are more complicated and comparisons are important.

February 17, 2026

Super Rugby Predictions for Week 2

Team Ratings for Week 2

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Chiefs 12.63 12.36 0.30
Blues 8.63 8.91 -0.30
Hurricanes 8.29 8.29 -0.00
Crusaders 8.03 8.41 -0.40
Brumbies 6.51 5.59 0.90
Reds 0.77 1.74 -1.00
Highlanders -2.67 -3.06 0.40
Waratahs -4.88 -5.84 1.00
Moana Pasifika -7.18 -7.88 0.70
Western Force -7.21 -6.29 -0.90
Fijian Drua -8.34 -7.64 -0.70

 

Performance So Far

So far there have been 5 matches played, 1 of which were correctly predicted, a success rate of 20%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Crusaders Feb 13 25 – 23 -6.50 FALSE
2 Waratahs vs. Reds Feb 13 36 – 12 -2.60 FALSE
3 Fijian Drua vs. Moana Pasifika Feb 14 26 – 40 3.70 FALSE
4 Blues vs. Chiefs Feb 14 15 – 19 1.60 FALSE
5 Western Force vs. Brumbies Feb 14 24 – 56 -6.90 TRUE

 

Predictions for Week 2

Here are the predictions for Week 2. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Hurricanes vs. Moana Pasifika Feb 20 Hurricanes 20.50
2 Waratahs vs. Fijian Drua Feb 20 Waratahs 8.50
3 Highlanders vs. Chiefs Feb 21 Chiefs -10.30
4 Western Force vs. Blues Feb 21 Blues -12.30
5 Crusaders vs. Brumbies Feb 22 Crusaders 5.00

 

February 14, 2026

Cancer hates mornings too?

Via the pharmaceutical chemist Derek Lowe, and also various media outlets, there is a new cancer study that randomised patients with lung cancer to get their immunotherapy infusions in the morning or the afternoon/evening.  The motivation  will have been the various not-very-convincing correlational studies where patients getting morning treatment did better on average. In those studies the differences seen were large, but the studies were small enough that only large differences could have been seen.

The new study also saw a massive difference between morning and afternoon treatment, with the estimated rate of survival without disease progression being 60% lower in the morning group. That difference was 5.5 standard errors away from zero — almost physics levels of statistical surprise.

So, what  do we check?

First: dropout. Maybe the healthy patients in the afternoon or the sick patients in the morning dropped out? No, according to the research paper everyone who was randomised was included in the final analysis.

Second: did they report what they said they would report? Up to a point, yes.  The clinical trial registry says they started out with overall survival and response rate (tumour shrinkage) as their measurements of success. They changed to progression-free survival as their headline measurement after the trial had been running for a while, which is potentially dodgy.  On the other hand, they did report overall survival, and the results are almost as good as progression-free survival.  They also reported response rate, which had unimpressive favorable results, but which is a much less important measurement.  If things had gone the other way, with good response and bad survival data I would have believed the survival data.

We should now consider whether the results make sense.  This is immunology — as Ed Yong described it for the Atlantic, “where intuition goes to die”.  Looking at the experts (Derek Lowe and the people quoted in the news stories) it seems they don’t completely believe it, but they are also unwilling to entirely disbelieve it.  The drug hangs around in the body for weeks, making a time-of-day effect surprising, but who knows?  The result agrees with past correlational research, but that past research is not very convincing. The worst that the experts quoted by Stat (a medical news site) were willing to say is that only half the eligible patients were randomised, which might mean problems in generalising the results. Fortunately, this trial will be relatively easy to replicate, directly in lung cancer, or in the range of other conditions such as melanoma or head and neck cancer where this specific antibody is used, or in the wider world of immune checkpoint inhibitors.

The possibility that’s not mentioned by any of the news stories is fraud: either faking data or faking the tidiness of the randomisation and completeness of the data. Fraud happens; it’s a definite possibility.  On the other hand, this doesn’t look like an especially attractive place to try it. Other researchers are bound to redo the experiment, and look into the details, and Big Pharma hasn’t worked out how to manufacture more than one morning per day.

I expect these results to fail to replicate, but I wouldn’t bet large amounts of money on it.

Olympics condoms

Every two years (since 1988) there is at least one round of stories about condoms at the Olympics (here’s a couple of past StatsChat posts).

Many athletes would have the funds and general executive function to be able to acquire condoms for themselves, and it’s clear that a big part of the point is safe-sex advertising. It’s relatively difficult to get a positive story about condom use into the world’s prestige press, and the Olympics are an opportunity.

Usually the story is about oversupply  (450,000 in Rio!, 200,000 Olympics-branded ones in Paris!). For a couple of Olympics the story was about social distancing (Tokyo had 110,000, but they were officially just souvenirs).

This year the story is undersupply: Milano/Cortina apparently had a mere 10,000 condoms, which ran out on day 3.  It’s possible that this is a planning failure, like the nearly-finished cable car, but it might also be that the whole advertising mission of the Olympic condoms is losing its urgency.

February 12, 2026

Briefly

  • The US FDA will not even review the application for Moderna’s new flu vaccine. The FDA is very careful to give itself the flexibility to do this: if they say supportive things about your trial design today there is no legal guarantee that they can’t change their minds six times before breakfast. However, they are usually reluctant to make radical changes in their advice and, for example, typically don’t require placebo controls when an existing treatment works and is already widely recommended.
  • Hayden Donnell at the Spinoff did a deeply felt post on the scale of the Moa Point sewage discharges, with comparisons to everyday life. I want to quote one: “ If you started now, it would take you 2,535 years, 15 days, 13 hours, and 20 minutes of non-stop shitting to produce as much waste as the Moa Point plant is expelling onto schools of unsuspecting fish every day. From this we can deduce with a little additional calculation that the roughly 200,000 people living in Wellington City would take about four and half days to produce one day of the Moa Point poop.  Even allowing for politicians, that’s a lot of effort. The problem, of course, is dilution: a toilet flush is about 10 litres rather than the 0.1 litres the Spinoff is allowing for, and there may well be further dilution downstream.
  • A set of six posts about colour (or perhaps ‘color’) from NASA
  • The American Statistical Association is taking nominations for its “Excellence in Statistical Reporting” award, due March 1st.
  • “And it turned out that the previous gender discrimination policy had been nothing like discriminatory enough; women were much safer drivers, and hadn’t previously been getting anything like enough credit for it.” Dan Davies’s excellent “Back of Mind” newsletter.
February 11, 2026

Super Rugby Predictions for Week 1

Team Ratings for Week 1

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Chiefs 12.36 12.36 0.00
Blues 8.91 8.91 0.00
Crusaders 8.41 8.41 -0.00
Hurricanes 8.29 8.29 -0.00
Brumbies 5.59 5.59 0.00
Reds 1.74 1.74 0.00
Highlanders -3.06 -3.06 -0.00
Waratahs -5.84 -5.84 0.00
Western Force -6.29 -6.29 0.00
Fijian Drua -7.64 -7.64 0.00
Moana Pasifika -7.88 -7.88 -0.00

 

Predictions for Week 1

Here are the predictions for Week 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Crusaders Feb 13 Crusaders -6.50
2 Waratahs vs. Reds Feb 13 Reds -2.60
3 Fijian Drua vs. Moana Pasifika Feb 14 Fijian Drua 3.70
4 Blues vs. Chiefs Feb 14 Blues 1.60
5 Western Force vs. Brumbies Feb 14 Brumbies -6.90

 

Coffee brain?

Various sources are telling us that coffee and tea consumption can lower the risk of dementia (the Independent is clickbaiting it to “the common drinks linked with reducing risk of dementia“, and 9News in Australia is even more extreme with The everyday act that could reduce your risk of dementia, according to Harvard study).  The subtext is definitely that caffeine is responsible for the decrease.

The research (paywalled, sadly) comes from two large studies of health professionals: the Nurses’ Health Study and the Health Professionals Follow-up Study.  You will have heard of them before; the participants have now been studied for 30-40 years and thousands of research papers written.  The rate of dementia was about 20% lower in  people who drank above-average amounts of tea or caffeinated coffee, but this reduction was not seen in people who drank decaf coffee. Since about 1 in 10 of the participants ended up with dementia, a 20% lower rate would mean preventing about two cases per 100 people. That’s not huge, but it’s not trivial either.  If you’ve been following medical news, it’s about the same reduction in dementia claimed for the shingles vaccine.

Unlike the shingles vaccine, which took advantage of a change in the rules that approximates randomisation, the coffee finding is correlations. Should we believe it?

It helps that the study is quite large (so random noise is less likely to give big spurious differences) and that participants’ coffee and tea consumption was measured from early on in the study. This study,  unlike small studies, would probably have been published whatever its findings, especially as the lead researcher is a Harvard PhD student.  It also helps that we know coffee and tea are pretty safe — many people who are suspicious of drugs and/or fun have tried quite hard to find harmful effects, with remarkably little success.

One negative fact, at least for the caffeine explanation, is the finding for tea.  The estimated risk reduction for a group of people who drank an average of 1 cup of tea per day is about the same as for a group who drank an average of 2.5 cups of coffee per day — and 2.5 cups of coffee is a lot more caffeine than one cup of tea.

I don’t think the data are all that convincing — this is really below the limit of what can reliably be done with long-term diet data — but we are not going to get better correlational data on coffee, and a randomised trial is outside the range of plausibility. If you drink tea or caffeinated coffee, it’s nice to think that you might be protecting your brain. If you don’t, there’s probably some reason you don’t. I’m not sure these data should change your mind.

February 10, 2026

Medical chatbots: the questions or the answers

A story in the NYTimes and also an unpaywalled story at 404 Media report a study of chatbots for medical advice, saying they are Bad and Not Good.

The research study is published in Nature Medicine. It’s a randomised controlled experiment, where people pretending to be patients were given a set of symptoms and some background health and lifestyle information.  These people were randomly assigned to talk to one of three large language models or just to use whatever information they would normally use at home for a health problem.

The three chatbots were chosen because they were able to recognise the medical situation in nearly every case and typically give appropriate advice  when directly given the same information that the pretend patients had.   When used in chat by non-medical people, though, the bots did much less well. One highlighted example was a scenario of a severe, sudden-onset headache, with sensitivity to light and a stiff neck.   In this scenario, the sudden onset and the stiff neck are both signs of a very serious event — the scenario was based on subarachnoid haemorrhage, a type of stroke.  One pretend patient emphasised the suddenness of the headache and got the correct advice (Ambulance! Now!), another didn’t mention the onset and got advice for a migraine or a tension headache (“lie down in a dark room”).  The bots weren’t any worse than unaided lay people, but they weren’t any better either.

You might think it’s a bit unfair to the chatbot that it wasn’t given all the information, but an important part of the training of doctors (as with statisticians and lawyers and plumbers) is learning what questions to ask when dealing with a non-specialist member of the public.  Obviously, even if you think there’s a barrier in principle to statistical algorithms making great art, there’s no barrier in principle to statistical algorithms learning to take adequate medical histories. They aren’t there yet.

 

 

Who did the Superbowl half-time show?

Unless you have been living in a cave* you will probably be aware that the lead performer was one Benito Antonio Martínez Ocasio, a Puerto Rican rapper who performs as “Bad Bunny”.  The story is complicated a bit because of prediction markets.  The idea of prediction markets is that they can predict the future by letting experts get paid for integrating all the information about a question and betting correctly.

There are reasons to be somewhat skeptical.  The best way to make money out of a prediction market is to have inside information, but if that is too common then no-one sensible who doesn’t have inside information will bet and lose, so the incentives go away. It’s not clear how well they can work in practice.  On the other hand, two US companies, Kalshi and Polymarket, have discovered that gambling can be rebranded as a prediction market, with less regulation, lower minimum age for participants, and more favorable tax treatment.  It’s possible that sports gamblers will also help rescue prediction markets by providing uninformed money.

The other problem with prediction markets about complicated questions is deciding whether the event happened.  According to Business Insider, quite a number of people had bet on predicted whether Cardi B would do the Superbowl half-time show. You and I and probably many of those people might have expected this binary yes/no question to be easy to resolve. In fact, Kalshi and Polymarket resolved it in opposite directions.  The complication is that Cardi B (along with various other well-known performers) was there on stage, so that precise definitions are going to matter.

It’s possible that some fiendishly clever people predicted this confusion and correctly predicted that Kalshi and Polymarket would split on the question and extracted a big win. If so, go them! Otherwise, whether any hypothetical smart money won or lost would depend on the luck of which market it chose.

 

* “on Mars, with your eyes closed and your fingers in your ears” as the Simpsons’ Sideshow Cecil put it

February 3, 2026

United Rugby Championship Predictions for Delayed Games

Team Ratings for Delayed Games

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 9.88 13.41 -3.50
Glasgow 8.15 6.18 2.00
Bulls 7.54 8.86 -1.30
Stormers 6.72 4.17 2.60
Munster 2.43 3.65 -1.20
Edinburgh 1.93 2.67 -0.70
Ulster 1.14 -3.24 4.40
Sharks 0.65 1.29 -0.60
Lions -0.93 -1.19 0.30
Connacht -1.71 -1.39 -0.30
Scarlets -2.53 -0.54 -2.00
Cardiff Rugby -2.73 -2.74 0.00
Ospreys -2.99 -2.15 -0.80
Benetton -5.21 -2.32 -2.90
Dragons -9.74 -15.66 5.90
Zebre -12.61 -11.02 -1.60

 

Performance So Far

So far there have been 84 matches played, 56 of which were correctly predicted, a success rate of 66.7%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Benetton vs. Scarlets Jan 31 20 – 20 5.50 FALSE
2 Glasgow vs. Munster Jan 31 31 – 22 13.80 TRUE
3 Lions vs. Bulls Feb 01 17 – 52 -3.60 TRUE
4 Sharks vs. Stormers Feb 01 36 – 24 -6.00 FALSE
5 Zebre vs. Connacht Feb 01 15 – 31 -2.30 TRUE
6 Leinster vs. Edinburgh Feb 01 28 – 20 16.00 TRUE
7 Ospreys vs. Dragons Feb 01 19 – 13 9.50 TRUE
8 Ulster vs. Cardiff Rugby Feb 01 21 – 14 12.00 TRUE

 

Predictions for Delayed Games

Here are the predictions for Delayed Games. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Lions vs. Sharks Feb 22 Lions 0.40