Posts filed under General (2759)

January 24, 2025

When science news isn’t new

The Herald (from the Telegraph) says People with divorced parents are at greater risk of strokes, study finds

The study is here and the press release is here. It uses 2022 data from a massive annual telephone survey of health in the US, the Behavioral Risk Factor Surveillance System, “BRFSS” to its friends.

Using BRFSS means that the data are representative of the US population, which is useful.  On the other hand, you’re limited to variables that can be assessed over the phone.  That’s fine for age, and probably fine for parental divorce.  It’s known to be a bit biased for BMI and weight. The telephone survey doesn’t even try to collect blood pressure, cholesterol, or oral contraceptive use, all known to be risk factors for stroke.  And if you call people up on the phone and ask if they’ve ever had a stroke, you tend to miss the people whose strokes were fatal or incapacitating (about a quarter of people die immediately or within a year if they have a stroke).

Still, the researchers collected some useful variables to try to adjust away the differences between people with and without divorced parents.  As usual, we have to worry about whether they went too far — for example, if the mechanism was via diabetes or depression, then adjusting for diabetes or depression would induce bias in the results.

This sort of research can be useful as a first step, to see if it’s worth doing an analysis using more helpful data from a study that followed people up over time — either a birth cohort study or a heart-disease cohort study.  It’s interesting as initial news that there’s a relationship — though you also might think adverse effects of divorce would get smaller in recent decades as divorce became less noteworthy.

All this is background for my main point.  While looking for links to published papers, I found that one of the same researchers had done the same sort of analysis with the BRFSS data from 2010 and published it in 2012. They found a stronger association twelve years ago than now.  I don’t know about you, but I would have appreciated this fact being in the press release and in the news story.

January 22, 2025

Housework reporting reporting

Listening to the Slate Money podcast, I heard about an interesting survey result

Elizabeth Spiers:  My number is 73 and that’s percent and that’s the number of men in a recent YouGov survey who say they do most of the chores in their household.

I found a Washington Post story

60 percent of women who live with a partner say they do all or most of the chores. But 73 percent of similarly situated men say that they do the most — or that they share chores equally.

Here’s the top few lines of the YouGov table

Breaking out advanced statistical software, 23%+19% is 42%, and 33%+30% is 63%. The figure for women matches the story, allowing for reasonable rounding. The figure for men doesn’t?

If we add in “shared equally”, which is given as 33% for men and 22% for women we can get to 75% “all” or “most” or “equally” for men, but 83% “all” or “most” or “equally” for women.  And while the story is supposed to be that men say they are doing more and are delusional, the reported table has more work self-reported by women than men at all levels. It’s still possible, of course, that some the 42% of men claiming to do all or most chores are not fully aware of the situation, but the message that makes the results headline-worthy is not in the data.

The Washington Post story is still well worth reading — it uses the YouGov poll as a hook to discuss much more detailed data from the American Time Use Survey, which has the advantage that people write down contemporaneously what they are doing for an actual two weeks rather than trying to guess at an average.

 

January 21, 2025

Briefly

  • Another example of asking people questions they can’t reasonably be expected to answer, at Newsroom. This is a survey from Forest and Bird, who asked about the proportion of NZ’s ocean that was, and that should be, in marine reserves.  The actual figure is 0.4%. People thought it was a lot more, and that it should be a lot more.  It’s possible the current general population estimate has been influenced by the Kermadec reserve that was proposed by the last government, which would have protected 15%, but I didn’t remember that figure so I don’t know.   The “Not Sure” figure for how much is currently protected is 23%, which is larger than you often see, but clearly still smaller than it should have been. Survey interviewers often push quite hard to get people to give a concrete answer, but I don’t know if that happened here
  • A good post about the cost of false positives, in this case in detecting spammers on social media.
  • Useful tools for interpreting the news: Molly White writes about cryptocurrency market caps and what they mean and don’t mean.
  • Outside the usual range of StatsChat, but about the value of official statistics in policy debates.  Bret Devereaux (who you should read if you have any interest in ancient history or its depiction in movies and video games) is writing about the Gracchi, the land reformers of the 2nd century BCE.

Except notice the data points being used to come up with this story: the visible population of landless men in Rome and the Roman census returns. But, as we’ve discussed, the Roman census is self-reported, and the report of a bit of wealth like a small farm is what makes an individual liable for taxes and conscription.

In short the story we have above is an interpretation of the available data but not the only one and both our sources and Tiberius Gracchus simply lack the tools necessary to gather the information they’d need to sound out if their interpretation is correct.

January 20, 2025

Do I look like Wikipedia?

Ipsos, the polling firm, has again been asking people questions to which they can’t reasonably be expected to know the answer, and finding they in fact don’t.  For example, this graph shows what happens when you ask people what proportion of their country are immigrants, ie, born in another country.  Everywhere except Singapore they overestimate the proportion, often by a lot.  New Zealand comes off fairly well here, with only slight underestimation. South Africa and Canada do quite badly. Indonesia, notably, has almost no immigrants but thinks it has 20%.

Some of this is almost certainly prejudice, but to be fair the only way you could know these numbers reliably would be if someone did a reliable national count and told you.  Just walking around Auckland you can’t tell accurately who is an immigrant in Auckland, and you certainly can’t walk around Auckland and tell how many immigrants there are in Ashburton.  Specifically, while you might know from your own knowledge how many immigrants there were in your part of the country, it would be very unusual for you to know this for the country as a whole.  You might expect, then, that the best-case response to surveys such as these would be an average proportion of immigrants over areas in the country, weighted by the population of those areas. If the proportion of immigrants is correlated with population density, that will be higher than the true nationwide proportion.

That is to say, if people in Auckland accurately estimate the proportion of immigrants in Auckland, and people in Wellington accurately estimate the proportion in Wellington, and people in Huntly accurately estimate the proportion in Huntly, and people in Central Otago accurately estimate the proportion in Central Otago, you don’t get an accurate nationwide estimate if areas with more people have a higher proportion of immigrants. Which, in New Zealand, they do.  If we work with regions and Census data, the correlation between population and proportion born overseas is about 50%.  That’s enough for about a 5 percentage point bias: we would expect to overestimate the proportion of immigrants by about 5 percentage points if everyone based their survey response on the true proportion in their part of the country.

Fortunately, if the proportion of immigrants in your neighbourhood or in the country as a whole matters to you, you don’t need to guess. Official statistics are useful! Someone has done an accurate national count, and while they probably didn’t tell you, they did put the number somewhere on the web for you to look up.

January 17, 2025

Briefly

  • Official statistics agencies are very conservative about survey questions, because changing them causes problems.  Another example: in the last US census, the number of people reporting more than one ethnicity increased. The Census Bureau said

“These improvements reveal that the U.S. population is much more multiracial and diverse than what we measured in the past,” Census Bureau officials said at the time.

But does that mean there are more people now with the same sort of multiple heritage, or that the same people are newly identifying as multi-ethnic, or just that the question has changed? According to Associated Press, new research suggests it’s mostly measurement.

  • “And surveys are especially useless when respondents have the option of answering in a way that is both “respectable” and self-flattering. ”  Fred Clark, talking about a survey of ‘politics’ in religion in the US.
  • Greater Auckland on last year’s road deaths. It’s a good post, with breakdown of subgroups and discussion of appropriate denominators and so it. I’d still ideally like to see random-variability shown in these sorts of trend lines.  The simplest level of this, so-called Poisson variability, is fairly easy: you take a reported count, take the square root, add and subtract 1 to get limits, and square again. You don’t need to go to the lengths of full-on Bayesian modelling unless you want to make stronger claims
January 14, 2025

Only a flesh wound

An article from ABC News in Adelaide, South Australia, describes incidents where fencing wire was strung across a bike path.  According to police

The riders were travelling about 35 kilometres per hour and fell from their bikes. Two suffered minor injuries, while the third was not injured.

Police said each of their bicycles were severely damaged.

That sounds at first like extraordinary good luck: if you come off a bike at 35 km/h and your bike was wrecked, you’d expect to be damaged too.  I think the problem, as with a lot of discussions of road crashes, is the official assessment metrics for injuries.  In South Australia, according to this and similar documents:

Serious Injury – A person who sustains injuries and is admitted to hospital for a duration of at least 24 hours as a result of a road crash and who does not die as a result of those injuries within 30 days of the crash.

Minor Injury – A person who sustains injuries requiring medical treatment, either by a doctor or in a hospital, as a result of a road crash and who does not die as a result of those injuries with 30 days of the crash.

A broken bone leading to substantial disability might easily not be a Serious Injury, and several square inches of road rash may well not be even a Minor Injury. (New Zealand has the same definition of a “serious” injury is one that gets you admitted to hospital for an overnight stay, but doesn’t have restrictive standards for minor injury)

It’s not that these definitions are necessarily bad for collecting data — there’s a lot to be said for a definition that’s administratively checkable — but it does mean you might want to translate from officialese to ordinary language when reporting individual injuries or aggregated statistics to ordinary people.

 

Update: One of the cyclists in the first group has talked to the ABC. One of the “minor injuries” required five stitches.

January 10, 2025

All-day breakfast

Q: Did you see that coffee only works if you drink it in the morning?

A: Works?

Q: Reduces your risk of heart disease

A: That’s … not my top reason for drinking coffee. Maybe not even top three.

Q: But is it true?

A: It could be, but probably not

Q: Mice?

A: No, this is in people. Twenty years of survey responses from the big US health survey called NHANES. They divided people into three groups depending on their coffee consumption on the day of their dietary interview: no coffee, coffee only between 4am and 11:59am, and coffee throughout the day

Q: Couldn’t there be other differences between people who drink coffee in the morning and afternoon? Like, cultural differences or age or health or something?

A: Yes, the researchers aimed to control for these statistically: differences in age, sex, race and ethnicity, survey year, family income, education levels, body mass index, diabetes, hypertension, high cholesterol, smoking status, time of smoking cessation, physical activity, Alternative Healthy Eating Index, total calorie intake, caffeinated coffee intake, and decaffeinated coffee intake, tea intake, and caffeinated soda intake, short sleep duration, and difficulty sleeping

Q: Um. Wow?

A: I mean, it’s a good effort. One of the real benefits of NHANES is it measures so much stuff.  On the other hand a lot of these things aren’t measured all that precisely, and it’s not like you have a lot of people left when you chop up the sample that many ways.  And the evidence for any difference is pretty marginal

Q: What’s their theory about how the coffee supposed to be working?

A: The Guardian and BBC versions of the story quotes experts who thinks it’s all about coffee disrupting sleep

Q: That sounds kind of plausible — but didn’t you say they adjusted for sleep?

A: Yes, so if the adjustments work it isn’t sleep

Q: Should we be campaigning for cafes to close early, like the new Auckland alcohol regulations?

A: It’s much too early for that, and in any case there isn’t any real suggestion coffee is harmful after noon.  It might be worth someone repeating the research in a very different population from the US but where people still drink coffee. There are plenty of those.

Q: And what about advice to readers on their coffee consumption?

A: The standard StatsChat advice: if you’re drinking coffee in the morning primarily for the good of your heart, you may be doing it wrong.

 

January 9, 2025

Briefly

  • The top baby names from New Zealand last year are out.  As we’ve seen in the past, the most-common names keep getting less common. “Noah” came top for boys, with only 250 uses, and “Isla” for girls, with only 190 uses.
  • The Daily Mail (because of course) has something purporting to be a map of penis sizes around the world, credited to this site, which gives no sources for the data. Wikipedia points out that a lot of data on this topic is self-reported claims. Wikipedia (possibly NSFW) notes thatMeasurements vary, with studies that rely on self-measurement reporting a significantly higher average than those with a health professional measuring. Even when it’s measured, it tends to be on volunteer samples, and there isn’t good standardisation of measurement protocols across sites.
  • If you live in one of these Aussie suburbs buy a lottery ticket NOW, says the headline on MSN.com, from the Daily Mail (Australia version).  This is a much more extreme headline than the NZ versions I usually complain about, and the text is more measured. Of course, there are two reasons why a suburb will see more lottery wins. The first is just chance, which doesn’t project into the future like that. The second is that these are suburbs where more money is lost on the lottery. Those trends probably will continue, but lottery advertising stories never seem to print the amounts lost on lotto.
  • We’ve seen a number of times that salary/wage ranges generated from advertising at Seek are not very similar to those reported from actual payments by StatsNZ.  This is worse: via Carl Bergstrom and Eduardo Hebkost, on Bluesky, apparently ziprecruiter.com will (in the US at least; not in NZ) give you salaries for any job you ask about, if you just forge a URL pointing to where the graph should be
January 8, 2025

United Rugby Championship Predictions for Week 10

Team Ratings for Week 10

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Leinster 14.87 12.09 2.80
Glasgow 9.56 9.39 0.20
Bulls 8.19 8.83 -0.60
Stormers 4.70 6.75 -2.10
Lions 4.43 6.73 -2.30
Munster 4.11 9.28 -5.20
Ulster 3.02 2.52 0.50
Edinburgh 2.79 0.09 2.70
Cheetahs 0.80 0.80 0.00
Sharks 0.03 -2.94 3.00
Connacht -1.95 -0.76 -1.20
Benetton -2.15 1.02 -3.20
Scarlets -4.38 -10.65 6.30
Ospreys -4.83 -2.51 -2.30
Cardiff Rugby -6.09 -2.55 -3.50
Southern Kings -6.52 -6.52 0.00
Zebre -12.64 -16.17 3.50
Dragons -13.94 -15.41 1.50

 

Performance So Far

So far there have been 69 matches played, 51 of which were correctly predicted, a success rate of 73.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Stormers vs. Sharks Dec 29 24 – 20 8.10 TRUE
2 Connacht vs. Ulster Dec 30 7 – 17 -0.40 TRUE
3 Munster vs. Leinster Dec 30 7 – 28 -6.10 TRUE
4 Edinburgh vs. Glasgow Dec 30 10 – 7 -6.30 FALSE
5 Zebre vs. Benetton Dec 30 12 – 24 -6.90 TRUE
6 Cardiff Rugby vs. Ospreys Jan 02 13 – 13 1.60 FALSE
7 Scarlets vs. Dragons Jan 02 32 – 15 10.70 TRUE

 

Predictions for Week 10

Here are the predictions for Week 10. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Glasgow vs. Connacht Jan 25 Glasgow 17.00
2 Ospreys vs. Benetton Jan 25 Ospreys 2.80
3 Lions vs. Bulls Jan 26 Bulls -1.30
4 Scarlets vs. Edinburgh Jan 26 Edinburgh -1.70
5 Leinster vs. Stormers Jan 26 Leinster 15.70
6 Cardiff Rugby vs. Sharks Jan 26 Sharks -0.60
7 Dragons vs. Munster Jan 26 Munster -12.50
8 Ulster vs. Zebre Jan 27 Ulster 21.20

 

Rugby Premiership Predictions for Round 11

Team Ratings for Round 11

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Bath 13.34 5.55 7.80
Bristol 6.85 9.58 -2.70
Northampton Saints 6.25 7.50 -1.20
Sale Sharks 5.04 4.73 0.30
Saracens 2.18 9.68 -7.50
Leicester Tigers 1.83 3.27 -1.40
Gloucester 0.60 -9.04 9.60
Harlequins -0.57 -2.73 2.20
Exeter Chiefs -2.92 1.23 -4.10
Newcastle Falcons -21.84 -19.02 -2.80

 

Performance So Far

So far there have been 50 matches played, 32 of which were correctly predicted, a success rate of 64%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Newcastle Falcons vs. Harlequins Jan 04 14 – 38 -12.50 TRUE
2 Gloucester vs. Sale Sharks Jan 05 36 – 20 -1.40 FALSE
3 Leicester Tigers vs. Exeter Chiefs Jan 05 28 – 15 10.80 TRUE
4 Saracens vs. Bristol Jan 05 35 – 26 0.00 TRUE
5 Northampton Saints vs. Bath Jan 06 35 – 34 -1.00 FALSE

 

Predictions for Round 11

Here are the predictions for Round 11. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Harlequins vs. Northampton Saints Jan 25 Northampton Saints -0.30
2 Exeter Chiefs vs. Saracens Jan 26 Exeter Chiefs 1.40
3 Gloucester vs. Leicester Tigers Jan 26 Gloucester 5.30
4 Bristol vs. Newcastle Falcons Jan 27 Bristol 35.20
5 Sale Sharks vs. Bath Jan 27 Bath -1.80