Posts written by Thomas Lumley (2548)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

April 17, 2023

Looking for ChatGPT

Turnitin, a company that looks for similar word sequences in student assignments, says that it can now detect ChatGPT writing.  (Stuff, RNZ)

The company is 98% confident it can spot when students use ChatGPT and other AI writing tools in their work, Turnitin’s Asia Pacific vice president James Thorley said.

“We’re under 1% in terms of false positive rate,” he said.

It’s worth looking at what that 1% actually means.  It appears to mean that of material they tested that was genuinely written by students, only 1% was classified as being written by ChatGPT.  This sounds pretty good, and it’s a substantial achievement if it’s true. This doesn’t mean that only 1% of accusations from the system are wrong. The proportion of false accusations will depend on how many students are really using ChatGPT. If none of them are, 100% of accusations will be false; if all of them are, 100% of accusations will be true.

What does the 1% rate mean for a typical student?  An average student might hand in 4 assignments per course, for 4 courses per semester, two semesters per year.  That’s nearly 100 assignments in a three-year degree.  A false accusation rate of 1 in 100 means an average of one false accusation for each innocent student, which doesn’t sound quite as satisfactory.

The average is likely to be misleading, though.  Some people will be more likely than others to be accused.  In addition to knowing the overall false positive rate, we’d want to know the false positive rate for important groups of students.  Does using a translator app on text you wrote in another language make you more likely to get flagged? Using a grammar checker? Speaking Kiwi? Are people who use semicolons safe?

Turnitin emphasize, as they do with plagiarism, that they don’t want to be blamed for any mistakes — that all their tools do is raise questions.  For plagiarism, that’s a reasonable argument.  The tool shows you which words match, and you can then look at other evidence for or against copying. Maybe the words are largely boilerplate. Maybe they are properly attributed, so there is copying but not plagiarism.  In the other direction, maybe there are text similarities beyond exact word matches, or there are matching errors — both papers think Louis Armstrong was the first man on the moon, or something.  With ChatGPT there’s none of this. It’s hard to look for additional evidence in the text, since there is no real way to know whether something you see is additional or is part of the evidence that Turnitin already used.

April 13, 2023

Briefly

  • How to win at roulette. No, they haven’t repealed the martingale optional stopping theorem, but no mathematical model is a perfect description of reality
  • Farah Hancock has a good series of reports about Auckland and Wellington buses, for Radio NZ.
  • “At the outset, it’s important to note that the finding that “exercise is better than medicine” for depression is one that you cannot possibly make from this paper, because the authors *literally excluded papers that compared exercise to medication*”a Twitter thread on claims about a new study
  • Riding an electric bike drops heart and cancer risks, finds German studyExcept it doesn’t.  The published German study compares exercise levels and accident rates of a group of people riding electric and acoustic bikes. That’s all it does. Unsurprisingly, it finds that the e-bikers still get exercise, but not quite as much as the people using traditional bicycles.  There’s a lot of claims of evidence of major  health effects, that the researchers are supposed to have described to Der Spiegel (I don’t subscribe, so I can’t check).  These are (a) unpublished, and (b) can’t be as described — the study included about 2000 reasonably healthy participants and ran for twelve months, so it can’t possibly have collected substantial evidence about prevention of cancer or Alzheimer’s or heart disease.  That’s before we even get to any issues about confounding: how much does your health affect cycling vs cycling affecting your health.  As a long-term e-cyclist, I’d love these claims to be based on convincing evidence, but they just aren’t.
  • Via David Rumsey on Twitter, the first ‘flow’ visualisation maps, from the 1838 Irish Railway Commission Atlas
  • From Kieran Healy, the relationship between height and number of points scored in the US professional basketball league. No, there isn’t a visually clear relationship. That’s because you’re selecting everyone for being very good at basketball, and the shorter guys need to be better in other ways to compensate.  Height obviously matters; other things matter too.  In the same ways, using some sort of standardised test as a criterion for university admission will make it look as if the test isn’t related to performance at university.
March 3, 2023

Counting is hard

From Stuff: Japan just found 7000 islands it never knew existed

That’s not quite what happened, as the story goes on to say.  We’re not talking about Kupe discovering Aotearoa or even Abel Tasman ‘discovering’ New Zealand.  There are three main groups of ‘new’ islands:

  • Pairs or small groups of islands that were counted as a single island before and are now recognised as multiple separate islands.
  • Islands in rivers or lakes, which were known about before but not part of the previous count
  • Large sandbars, which were known about before, but used not to be included in the definition of ‘island’

There are also islands genuinely appearing and disappearing, because of volcanic activity and erosion, but that’s a tiny fraction of the discrepancy.

Counting things requires both a definition of the thing to be counted and good enough measurement to see and recognise them.  In other news, the New Zealand Census is on!

 

March 2, 2023

Sweet as?

There’s been a scary news story about an artificial sweetener that previously looked remarkably inoffensive, erythritol.  It’s produced by fermentation of various plant material, and it comes in ‘organic’ as well as normal versions; it’s got similar taste and feel to sugar, and you can even use it for baking.  So when someone reports that it increases the risk of heart attack by a lot, you’d hope they had investigated thoroughly.

The researchers didn’t particularly have it in for erythritol; they were looking at blood components to see if anything predicted heart attack by a useful amount, and found a dramatic correlation with the amount of erythritol in the blood — and I mean dramatic. Here’s the chart showing the percentage of people who didn’t have a heart attack over the three years after their blood measurement, divided into four equal groups based on erythritol in the blood:

It’s not quite as bad as it first looks — the y-axis only goes down to 80% — but it still suggests more than half of heart attacks are due to erythritol, making it almost as bad as smoking. And also that there’s a magic threshold between safe and dangerous doses. This is … hard to believe? Now, this was the group where they discovered the correlation, so you’d expect over-estimation.  Not being completely irresponsible, they did check in other groups of people. Here’s a separate US group

It’s not quite as dramatic, but we’re still looking at a doubling of risk in the highest group and no real difference between the other three. And we’re still looking at nearly half of heart attacks being due to this obscure artificial sweetener.

So it is credible?

One question to ask is whether there’s a mechanism for it to be dangerous — this isn’t a definitive criterion, because there’s a lot we don’t know, but it’s useful.  The researchers say that erythritol makes platelets more likely to clump together, triggering clots, which is a thing that can happen and would increase heart attack risk — that’s why aspirin, which disables platelets, has been recommended to prevent heart attacks.

Another question to ask is whether the high erythritol group got it from eating the most erythritol. If they aren’t, this isn’t going to give useful dietary advice.  The compound is made in the body to some extent, and it’s excreted via the kidneys. Could people at higher risk of heart attack be making more internally or excreting it less effectively?  The natural step here would be to feed healthy people some erythritol and see what happens to their platelets.  That study is apparently underway, though it’s small and has no control group.

You might also ask if there has been a dramatic increase in heart attacks over the time that erythritol has become more popular in foods?  There hasn’t been, though a moderate increase might have been masked by all the other factors causing decreases in risk.

I would be surprised if the risk turns out to really be this big, though it’s entirely possible that there’s some risk. At least the platelet study should be reasonably informative, and it’s a pity it wasn’t done before the publicity.

 

 

Pie and anti-pie

There are other issues with this graph (from the ABC’s Dan Ziffer): these are components of inflation rather than causes, why ‘beyond 3%’?  The big issue, though, is the pie, where the positive number add to 104% and then there’s the negative 4%.

You can’t have negative numbers in a pie chart; that isn’t how pies work.  If you combine 104% of a pie and 4% of an anti-pie, you’ll end up on this list

January 9, 2023

Briefly

  • “We were able to put together a relatively good data set of case numbers for all states, but we were explicitly forbidden to make the data publicly available, even though our data was more accurate than what was appearing in the media.” Rob Hyndman, quoted by the ABC
  • Yet another example that counting isn’t simply neutral, from the Wikipedia entry for the Bechdel Test, via depths of wikipedia: “What counts as a character or as a conversation is not defined. For example, the Sir Mix-a-Lot song “Baby Got Back” has been described as passing the Bechdel test, because it begins with a valley girl saying to another “oh my god, Becky, look at her butt”. 
  • From the Washington Post: is your name more common for dogs or people? (in the US, of course)
  • From the New York Times, estimated carbon emissions by neighbourhood across the USA.
  • From David Hood, using the Ministry of Health public data, our holiday Covid wave. Something different seems to have happened in Tarāwhiti, and it seems to have happened at roughly the same time as the Rhythm’N’Vines festival
January 8, 2023

Murderous Kiwis

Newshub has a story Map: New Zealand’s murder hotspots revealed.

This is the map

The map (and the text) don’t say what these geographical units are. Based on the context and the presence of “Counties Manukau” as one of them, I would expect them to be police districts: this (just a map, no data) is from the NZ Police website

There’s a few confusing things about the Newshub map, though.  We seem to be missing Wellington (in the text, too), along with Auckland City and Northland. The ‘Southern’, ‘Eastern’, and ‘Central’ police districts are under a label ‘Auckland’ at the top right, making them look as though they might be southern, eastern, and central Auckland.

As always, there’s the question of the appropriate denominator.  Police districts are large enough that the distinction between the location of the murder and the residence of the victim might not matter too much (in contrast to census area units and assault), and I’m going to assume that the data include homicides in private homes (in contrast to census area units and assault) because that would have been mentioned otherwise. So it seems reasonable to use a general population denominator. This is trickier than I would have expected; it seems quite hard to find the police district populations. If you’re putting in a police OIA request like this one you might want to ask them for populations as well.

Looking at maps, the police districts seem to (at least approximately) be combinations of DHBs*, so I used the populations of those DHBs. Here are the comparisons just by counts of homicides over nearly three years (we’re missing Wellington and Northland)

And here are the (approximated) rates per thousand people over those three years. You might worry about how well the three Auckland districts can be separated; it wouldn’t be hard to combine them.

Bay of Plenty looks higher and Canterbury, Counties, and Waitematā look lower when you account for the differences in numbers of people.  Comparisons like this usually want rates (how dangerous), not counts (how many), if a relevant denominator is available.

Newshub does get points, though, for correctly saying all these numbers are pretty low by international standards.

 

* DHB: Deprecated Health Boundary

January 5, 2023

How common is long covid and why don’t we know?

You see widely varying estimates for the probability of getting long Covid and for the recovery prognosis. Some of this is because people are picking numbers to recirculate that match their prejudices, but some of it is because these are hard questions to answer.

For example, the Hamilton Spectator (other Hamilton, not ours) reports a Canadian study following 106 people for a year. The headline was initially 75 per cent of COVID ‘long haulers’ free of symptoms in 12 months: McMaster study. It’s now 25 per cent of COVID patients become ‘long haulers’ after 12 months: Mac study. Both are misleading, though the second is better.

This study started out with 106 people, with an average age of 57. They had substantially more severe Covid than average:

Twenty-six patients recovered from COVID19 at home, 35 were admitted to the ICU, and 45 were hospitalized but not ICU-admitted

For comparison, in New Zealand the hospitalisation rate has been about 1% of reported cases, with about 0.03% of reported cases admitted to the ICU. It’s not a representative sample, and this matters for estimating overall prevalence. On top of that, only half the study participants have 12-month data. That means the proportion known to have become ‘long-haulers’ is only about 12%; the 25% is a guess that the people who didn’t continue with the study were similar.

A more general problem is that “long covid” isn’t an easily measurable thing. There are people who are still unwell in various ways a long time after they get Covid. There are multiple theories about what exactly is the mechanism, and it’s quite possible that more than one of these theories is true — we don’t even know that ‘long covid’ is just a single condition.  Because we aren’t sure about the mechanism or mechanisms, there isn’t a test for long Covid the way there is for Covid.  If you have symptoms plus a positive RAT or PCR test for the SARS-2-Cov virus you have Covid; that’s what ‘having Covid’ means. There isn’t a simple, objective definition like that for long Covid.

Because there isn’t a simple, objective test for long covid, different studies define it in different ways: usually as having had Covid plus some set of symptoms later in time. Different studies use different symptoms. The larger the study, the more generic the symptom measurements tend to be, and so you’d expect higher rates of people to report having those symptoms.  If you simply ask about ‘fatigue’ you’ll pick up people with ordinary everyday <gestures-broadly-at-internet-and-world> as well as people with crushing post-Covid exhaustion, even though they’re very different.

There are also different time-frames in different studies: more people will have symptoms for three months than for twelve months just because twelve months is longer.  Twelve-month follow-up also implies the study must have started earlier; a study that followed people for twelve months after initial illness won’t include anyone who had Omicron and might include a lot of unvaccinated people.

The different definitions and different populations matter. The majority of people in New Zealand have had Covid. There’s no way that 25% them have the sort of long Covid that someone like Jenene Crossan or Daniel Freeman did; it would be obvious in the basic functioning of society.   Some people do have disabling long Covid; some people have milder versions; some have annoying post-Covid symptoms; some people seem to recover ok (though they might be at higher risk of other disease in the future). We don’t have good numbers on the size of these groups, or ways to predict who is who, or treatments, and it’s partly because it’s difficult and partly because the pandemic keeps changing.

It’s also partly because we haven’t put enough resources into it.

Ok boomers?

A graph, which has been popular on the internets, in this instance via Matthew Yglesias

Another graph, showing the same thing per capita rather than as shares of the population, also via Matthew Yglesias. This one appears to have a very different message.

And a third graph, from the FRED system operated by the Federal Reserve Bank, showing US real per-capita GDP

So: Gen X have a much lower share of US wealth than the Baby Boomers did at the same age.  This is partly because we are a smaller fraction of the population than they were: per-capita wealth is similar.  But per-capita wealth being similar isn’t as good as it sounds, because the US as a whole is substantially richer now than when the Boomers were 50.

This isn’t a gotcha for either of the first two graphs — different questions are allowed to have different answers — but it might be useful context for the comparison

January 1, 2023

Briefly

  • The “Great Kiwi Christmas Survey” led to stories at Herald, Newshub, Farmers Weekly, and Radio NZ on what people were eating for their Christmas meal.  The respondents for the “Great Kiwi Christmas Survey” were variously described as “over 1000”, “over 1800”, and “over 3300” Kiwis, which seems a bit vague. According to newsroom, this was actually a bogus poll: “We promoted the survey through social media channels and sent the survey to those people who had signed up to receive information from us,” concedes Lisa Moloney, the promotions manager for Retail Meat NZ and Beef + Lamb NZ.  Headlines based on bogus polls aren’t ever ok — even when you don’t think the facts really matter. Newsroom argued that the results under-represented vegetarians, which is plausible, but you can’t really tell from the data presented on the number of vegetarians. Not all Christmas meals at which vegetarians are present will be centred around plant-based food, as any vegetarian can tell you.
  • Stuff, with the help of Auckland Transport, wrote about Auckland’s most prolific public transport user. Apparently, someone took 3400 trips over a year.  It’s surprising that’s even possible: nearly ten trips per day, every day,  and since the person is doing this on a gold card, starting no earlier than 9am on weekdays.  Assuming the numbers are correct — actually, whether the numbers are correct or not — it’s also a bit disturbing that this analysis was done.  The summaries of typical and top 100 users seem a lot more reasonable. The piece says “Stuff asked to interview the person, however Auckland Transport would not reveal their identity for privacy reasons.”, which is good, but you might want them not to be in a position to reveal it.
  • “Support for low-income housing followed a similar pattern, with broad approval for building it someplace in the country (82 percent) but much less for building it locally (65 percent)” at 538. There should be a word for this.
  • Interesting discussion on the Slate Money podcast about a data display, the “Fed Dot Plot”, which shows the best guesses of members of the Federal Reserve Open Market Committee as to what interest rates they will want in the future; each dot is one person.  The Fed is trying to de-emphasise this graph at the moment — partly because people tend to over-interpret it. Importantly, there’s no individual uncertainty shown, and there’s no way to tell how much of the difference between people is due to difference in what they think the economic situation will be and how much is due to differences in how they expect they will want to react to it.