June 18, 2023

Looking for ChatGPT again

When I wrote about ChatGPT-detecting last time, I said that overall accuracy wasn’t enough; you’d need to know about the accuracy in relevant groups of people:

In addition to knowing the overall false positive rate, we’d want to know the false positive rate for important groups of students.  Does using a translator app on text you wrote in another language make you more likely to get flagged? Using a grammar checker? Speaking Kiwi? Are people who use semicolons safe?

Some evidence is accumulating.   The automated detectors can tell opinion pieces published in Science from ChatGPT imitations  (of limited practical use, since Science has an actual list of things it has published).

More importantly, there’s a new preprint that claims the detectors do extremely poorly on material written by non-native English speakers. Specifically, on essays from the TOEFL English exam, the false positive rate averaged over several detectors was over 50%.  The preprint also claims that ChatGPT could be used to edit the TOEFL essays (“Enhance the word choices to sound more like that of a native speaker”) or its own creations (“Elevate the provided text by employing advanced technical language”) to reduce detection.

False positives for non-native speakers are an urgent problem with using the detectors in education. Non-native speakers may already fall under more suspicion, so false positives for them are even more of a problem. However, it’s quite possible that future versions of the detectors can reduce this specific bias (and it will  be important to verify this).

The ability to get around the detectors by editing is a longer-term problem.  If you have a publicly-available detector and the ability to modify your text, you can make changes until the detector no longer reports a problem.   There’s fundamentally no real way around this, and if the process can be automated it will be even easier.  Having a detector that isn’t available to students would remove the ability to edit — but “this program we won’t let you see says you cheated” shouldn’t be an acceptable solution either.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Chris Delaney

    “Having a detector that isn’t available to students would remove the ability to edit — but “this program we won’t let you see says you cheated” shouldn’t be an acceptable solution either.”

    Not only does it sound like a police state (there is evidence but you can’t see it), but it doubles down on the discrimination problem. It takes only a little bit of benign neglect and unconscious bias to imagine a great deal of injustice happening. History is generally not kind to this approach.

    The ChatGPT thing is a real dilemma, especially for light rewriting. Unless you have it do the references. Then it becomes quite easy to find (what an interesting sounding paper — why haven’t I read it yet?)

    1 year ago