An example of the failure of using AI to automate

A recent academic research paper came to the same conclusion as a similar paper from June 28th concerning plagiarism at universities in age of large language models: software cannot reliably detect AI-generated content. From the abstract: The research covers 12 publicly available tools and two commercial systems (Turnitin and Plagiarism Check) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text. That mostly summarizes the answers to the 5 research questions identified on page 3 of the paper (eg. RQ1: can detection tools for AI-generated text reliably detect human-written text?). One important question isn’t on that list: Can humans detect AI-generated text reliably? Nor is its follow-up: Can humans detect AI-generated text reliably when skillfully using AI tools? The last question is the most important one – and the only one that has a chance of being a yes. BTW, the study looked at 14 detection tools (listed on page 11), such as GPTZero, with 3.5M in funding, and of course Turnitin, acquired at 1.8B in 2019. (Sidenote: the study left out the least brittle plagiarism detection tool). That’s a lot of money being spent on some missed points. For one, would we even need these detection tools if we didn’t still assume that quantity of writing was an effective learning approach – or a reliable way to measure how well a student has learned something? And that’s probably the bigger question. But here’s where I’m focused: some assume that generative AI tools are here to automate complex solution soup-to-nuts; they’re not. They are here as aids to skillful, thinking mammals with big neo-cortexes. You whisper to the tools in the context of comprehensive approaches to complex problems.

Art of message – subscribe

An example of the failure of using AI to automate

July 10, 2023

A recent academic research paper came to the same conclusion as a similar paper from June 28th concerning plagiarism at universities in age of large language models: software cannot reliably detect AI-generated content.

From the abstract:

The research covers 12 publicly available tools and two commercial systems (Turnitin and Plagiarism Check) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text.

That mostly summarizes the answers to the 5 research questions identified on page 3 of the paper (eg. RQ1: can detection tools for AI-generated text reliably detect human-written text?).

One important question isn’t on that list:

Can humans detect AI-generated text reliably?

Nor is its follow-up:

Can humans detect AI-generated text reliably when skillfully using AI tools?

The last question is the most important one – and the only one that has a chance of being a yes.

BTW, the study looked at 14 detection tools (listed on page 11), such as GPTZero, with 3.5M in funding, and of course Turnitin, acquired at 1.8B in 2019. (Sidenote: the study left out the least brittle plagiarism detection tool).

That’s a lot of money being spent on some missed points. For one, would we even need these detection tools if we didn’t still assume that quantity of writing was an effective learning approach – or a reliable way to measure how well a student has learned something? And that’s probably the bigger question.

But here’s where I’m focused: some assume that generative AI tools are here to automate complex solution soup-to-nuts; they’re not. They are here as aids to skillful, thinking mammals with big neo-cortexes. You whisper to the tools in the context of comprehensive approaches to complex problems.

(This was originally published on Art of Message – subscribe here)