College Student Advocates criticizes over-reliance on new AI-detection tools

frankvahid
May 7, 2023
4 min read

Updated: May 27, 2023

5/27/23

College Student Advocates (CSA) is concerned about misuse of “AI-detection tools” claiming to automatically detect essays written by artificial intelligence (AI), causing severe and sometimes irreparable harm to students.

A key problem is professors putting undue trust in those tools’ results, such as “95% of this text has been determined to be generated by AI”. Such undue trust is leading to quick referrals to student conduct offices with little further investigation. In some cases that CSA has examined, the accusations are wildly inappropriate, with even the simplest investigation yielding clear evidence that the student wrote the essay themselves.

Such trust is unwarranted. AI-detection tools are new, and commonly yield false positives [1,2,6]. Even the reported accuracy rates like “less than [1%] false positive rate” [3] are too inaccurate to warrant such trust – in a 100-student class with 10 essays, a 1% rate would mean 5-10 of those 100 students might be falsely accused. And, the validity of such reported high accuracy rates like 1% are questionable; even the creator of ChatGPT (OpenAI) reports a 9% false positive rate [4]. Plus, those tools self-report that they may not work well on short essays of a few hundred words, yet many essays being checked by professors are indeed short. In one case that CSA examined, a professor punished a student due to a tool saying their essay was “likely to be AI generated”, yet the student fed the professor’s own text into the same tool and obtained the same “likely to be AI generated” result. In another case, a student was referred to student conduct by a professor who noted that turnitin.com claimed the student’s 180-word essay was “100% generated by AI”, but CSA examined the essay and found that claim to be absurd; the essay was quite basic and lower quality than the student’s other work (replete with grammatical errors and even an accidentally deleted paragraph; it was done in a hurry), plus the student had full word processor history showing their incremental work. CSA verified that turnitin.com nevertheless labeled the text as 100% generated by AI. Additionally, many professors are incorrectly using AI to detect AI, such as asking ChatGPT “Did you write this?”, with wildly inaccurate results.

Some professors use a tool’s potential-cheating reports, and nothing more, to refer cases to student conduct, figuring the rare innocent student can defend themselves in the appeals process, but not realizing how devastating such processes can be to innocent students – leading to disruption of academic performance, anxiety, depression, negative health consequences, dropout, and even suicide.

Fields like healthcare have the notion of “standard of care”, defined as “the degree of care a prudent and reasonable person would exercise under the circumstances” (NIH), also known as a “reasonable person” standard. College education does not have such a definition, but if it did, it is CSA’s belief that referring a student to a student conduct office for cheating based on an AI detector’s result, without other clear evidence or basis, is below the standard of care for a college professor. Such referral is not “reasonable” given AI detector inaccuracy, coupled with the devastating effects of false accusations.

Cheating today is rampant in universities, increasingly via AI generation of essays. Automated AI detection is thus an important tool. But, everyone involved must realize that those tools are currently limited in their capabilities. Turnitin.com warns that “[Professors] should use the indicator as a means to start a formative conversation with their student and/or use it to examine the submitted assignment in greater detail” and “the percentage is interpretive and should not be used as a definitive measure of misconduct or punitive tool” [5]. These warnings are appropriate – busy professors need tools to narrow in on potential cheating cases, and plagiarism and AI detector tools can be useful in that regard. But those tools only create a list of potential cases; professors must then perform proper investigation to find the real cheating cases.

But, the reality today is that most professors do not read the details of such warnings or discussions of accuracy, and given professors’ very limited available time, should not be expected to do so.

CSA makes two key recommendations:

CSA recommends that professors never use an AI detector’s report as the sole evidence for cheating, and should always investigate further using other means (which does NOT mean just using additional AI detectors). Professors are reminded that just contacting a student about potential cheating can be devastating to an honest student, so we hope professors start their investigations quietly – reading the essay themselves, comparing to prior work, or even deciding to wait for future essays from the student to build confidence one way or the other – and never take the “innocent students can defend themselves in the appeals process” approach. Furthermore, we call on universities to remind professors of their cheating policies, strengthen those policies if appropriate to clearly discuss use of AI detection tools, and apply real sanctions against professors who operate below the “standard of care” by ignoring those policies to the honest student’s detriment.
CSA believes AI-detection tool makers, like turnitin.com, should more directly and strongly warn professors against using reported results by themselves, recognizing that professors are busy and will not read detailed explanations (or even short popup explanations). They should also be more “humble” in listing those reported numbers because those numbers are still questionable, and they should be more careful not to exaggerate accuracy claims in their articles. They should clearly warn which essays are prone to false positives, like small essays or those improved via grammar tools. Short-listing students for cheating is a potentially dangerous product, and all potentially dangerous products should be built with safety as a paramount consideration – without waiting for the injuries to become so prevalent that action becomes impossible to resist. The latter unfortunately is how many dangerous products are brought to market because it maximizes profit – but we encourage educational products to be more focused on the humanitarian side of their business.

Contact: Frank Vahid, Founder, College Student Advocates, frank.vahid@collegestudentadvocates.org.

Dr. Vahid is also a Professor of Computer Science at the University of California, Riverside. Phone: 951-827-4710

[1] https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/

[2] https://goldpenguin.org/blog/turnitin-ai-detection-concerns/

[3] https://www.turnitin.com/press/turnitin-announces-ai-writing-detector-and-ai-writing-resource-center-for-educators

[4] https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text

[5] https://www.turnitin.com/products/features/ai-writing-detection

[6] https://www.nbcnews.com/tech/chatgpt-texas-college-instructor-backlash-rcna84888

Коментарі