Why I am not using AI in the classroom

Last week I spoke with some of my colleagues about the “challenges” posed by generative AI to our teaching. As anyone who follows me on Bluesky — or followed me on Twitter, prior to its nazification — will know, I am not a fan of generative AI. Its proponents and salesmen tend to dismiss critics like me as merely uninformed or, only slightly less patronizingly, afraid to learn. In fact, I have used generative AI in various guises (to generate images, to generate text, for “conversations” with automated imitations of historical figures, etc.), though not much in the last year or so. I have also read a bit about its workings and implications, though not as much as I might have. (Despite OpenAI’s best efforts, it is not yet my job to do so, and, despite the GOP’s, I still have a job.) But last week, in preparation for my little talk with colleagues, I took a look at about a dozen studies from the last couple of years on issues pertaining to AI’s much-touted uses in education. Not all came to quite the conclusions I would have guessed, though few painted an encouraging picture of its overall impact and nearly all noted significant negative effects of using AI for either reading or research — that is, information seeking, writing, or thinking, at least about topics where facts matter. (Most nevertheless concluded with tips on its classroom adoption. On this apparently inescapable techno-deterministic straitjacket, more another time.) Having gone to this much trouble, it seemed to me worthwhile to elaborate my thoughts a little more, somewhere, in writing. So here they are.

To begin with, I did not look at AI’s use in specialized research tools — translation or text recognition software, for example. This is not because I am in denial about its usefulness in any academic context, but because very little undergraduate teaching involves such specialized tools and because the scope of their classroom applications is and will probably remain minuscule next to both the uses and the pretensions of text-generators like ChatGPT. Likewise, I did not look into the widely discussed environmental harm of generative AI. This is an ethical matter that should shape decisions about its adoption, but it is not specific to AI’s uses in teaching or in higher education. Nor did I do more than mention the well-known problems of plagiarism (that AI is trained on people’s work without their prior knowledge or consent, and that much of what it produces on this basis — generally without acknowledgment — is thus stolen), lack of transparency (AI text generators do not usually cite their sources, nor is it clear that they are capable of doing so rigorously even when they do generate bibliographic references), hallucinations (AI makes some things up — or, rather, some of the things it makes up do not resemble reality), or its inherently exploitative nature (to use generative AI is to train it — to give your work, and very often others’, to AI companies gratis). I did not belabour the point that, despite its almost constant anthropomorphization in media coverage, generative AI does not “know” or “think” things about the world; it is a machine trained to produce things that look like the things it has seen in its data set. All these, too, are evidently pertinent to higher education, but not more so than to journalism or to any other field in which authorship, intention, originality and/or accuracy matter. And, more important: even if none of these problems existed, AI’s use in the classroom would still be harmful.

Much discussion of AI in education has focused on its potential for helping students write — or writing for them. As it is customary to note, AI massively increases productivity in writing. It secretes content on request. One fundamental problem with this as a selling point is that the generation of content is not the point of undergraduate teaching, at least in the humanities (and I daresay in the sciences, too). Student writing is a teaching method and an index of student learning, but to see a stack of papers as the point of an academic course rather than a by-product is to misunderstand what is being taught, which is knowledge of a subject, a disciplined habit of thought, and layers of reflection on both of these as well as on their mutual relationship. Substituting a text-generator for any aspect of this teaching does not help us achieve these goals more efficiently; it simply ends their pursuit, making students spectators rather than participants in the process of knowledge-making. Unlike Wikipedia, it does not give them a shortcut to facts and a list of authors and sources they can then wrestle with reading, interpreting, and writing about; it presents both facts (from unknown sources) and interpretations of them, too, in customizable, flash-frozen essays, like discursive Dippin’ Dots. Unlike databases such as Early English Books Online, it does not merely save students the cost of buying a book or flying to an archive; it obviates any sense that an archive — a layer of work behind the generated text, and layers of sources and interpretations, choices and arguments behind that work — is important to think about, much less work with, at all. It turns a complex process of making knowledge into a simple question of access to information. Or rather, to a machine that generates statements resembling information it has seen before.

To understand why this is a problem for teaching, it is vital to separate the question of how good a text is (whether we mean by this accurate, transparent, thorough, or euphonious) from what its purpose is. Critics of AI have been apt to emphasize the blandness or mediocrity of AI-generated text, the curious prominence of particular terms (“delve”), punctuation (em dashes), or sentence structures, as well as its routinely harmful and occasionally spectacular hallucinations or errors. (For the record, it appears to be true, for now, that AI-generated text differs in patterned ways from human writing, but not necessarily that it is more limited in its vocabulary.) The danger in focusing on such temporary tells is that better-quality AI writing, which is surely on its way, will be seen as solving a problem that it doesn’t. This is because the real problem with generative AI in teaching is not how good or even how human a mass of text can be made to look but rather why it was written in the first place. And the purpose of writing an essay, a research paper, a lab report, a poem, or a short story, in an undergraduate setting especially, is inseparable from who is doing the writing. If the point of writing is to learn how to write — not just how to spit words out in nice shapes on demand, but how to find, read, analyze, and interpret sources; how to formulate and express thoughts about them; how to arrange those thoughts on the page; and how to judge others’ efforts at all these things, as well as one’s own — then the capacity of machines to imitate the results of the process, or of any one step of the process, is beside the point. Its use in place of student effort prevents learning from taking place. It is not an aid to “productivity” but, quite literally, counterproductive; it’s a “tool” that makes the educator’s goal, which ought to be the student’s goal too, harder and more costly to accomplish. (Why students’ and teachers’ goals differ so in practice is a less flashy and less tractable problem than why ChatGPT can’t tell you how many “rs” are in “strawberry,” but it’s infinitely more germane to what’s at issue here.)

In this sense, the problem AI poses to education differs little in nature from the problem posed by paper mills. (This parallel used to be rejected on the grounds that generative AI is free and hence a democratic form of cheating — possibly even a revolutionary gesture against elitist gatekeeping, a spectre haunting the hidebound educational institutions of the West. Plagiarists of the world, unite! You have nothing to lose but your br- But then, predictably, came paid, premium AI services. Nature is healing.) One study found, for instance, that ChatGPT generates higher-quality “argumentative essays” — at least, short ones on “controversial topics” — than high-school students working on their own. On the other hand, another concludes that students using LLMs to look for evidence on a given question offer less relevant arguments than those using regular web searches. Other studies find that using generative AI makes university students less creative and less accurate, as well as less independent, writers. If you are looking for a 200-word statement of different takes on a controversial question, you may prefer ChatGPT’s summary to the average sixteen-year-old’s. But if you are reading for the sake of informing yourself, why bother with either one? And if you are trying to teach the sixteen-year-old to argue, it’s not clear why the machine’s writing matters. If the idea is that the machine-made text furnishes the student with something exemplary to read — well, the only reason for that is that it has trained on arguments by actual people, which the student could also read, at a large ethical discount. Whether the machine does a good job or not, it’s doing the job we want the student to do, not for the sake of the 200 words but for the sake of what the student learns by doing it. The only case in which the machine’s output is of paramount interest is the case in which we are in fact training the machine.

The problem is much the same with students using the machine to read for them — that is, to generate summaries of texts, and often, depressingly, of textbooks, which are effectively summaries to begin with. As with other AI-generated essays, AI summaries are subject to errors and hallucinations, but, again, that is not really the key point. As with writing, so with research, studies conclude that there are some things LLMs do comparatively well, as against web searches and books (such as conveying knowledge of basic concepts to people with little prior familiarity) and some things they do badly (such as helping those people remember what they’ve supposedly learned). In a mediascape obsessed with the idea of echo chambers, it should not be reassuring that LLM-based searches tend to deepen rather than correct information-seekers’ prior biases. For historians concerned with teaching basic skills like source criticism, or anyone concerned with actually knowing things, it should probably be concerning that LLM users exhibit a “low willingness” to engage with the sources LLMs use even when these are available at a click. Of course, poor retention of knowledge, lack of motivation to investigate the basis of claims, loss of critical thinking skills and of confidence in their own judgment — an appropriate loss of confidence, given shallower engagement with their objects of study — are hardly good for students. They are disastrous for would-be scholars. But these are all downstream of the fundamental problem that you can’t have a machine read for you and still get the educational benefits of reading. Scanning an executive summary of key points is not commensurate with working to make sense of an idiosyncratic, multidimensional piece of writing, or working — alone or in conversation with others, in person or in writing of their own — through a recalcitrant and alien source. As far as learning goes, the only beneficiary of your feeding other people’s writing into a machine that reads it for you is the machine.

If we pan out from the individual student to the class as a whole, other dimensions of the problem come into view. A traditional way of thinking about technology is that it saves labour; generative AI, in the classroom, adds to it. Remarkably, this is true whether a professor adopts AI or not. If an instructor does not want students to use AI, then maintaining that position becomes a time-consuming and all-but-hopeless task: either students’ written work (and their spoken comments too, where these might draw on AI summaries of readings or AI-generated answers to questions) must be vetted, or else assignments must be redesigned so as to make the use of AI hard or unrewarding, if not altogether impossible. The increased workload AI imposes is one thing; its instant devaluation of rich but trust-dependent teaching methods, honed over long careers, is another; the adversarial relationship it posits between teacher and students is yet another, and probably the most damaging and least discussed of all. Teaching becomes a search for ways around the technology, ways (usually, AI-powered ways) of surveilling their students’ use of it, and ways to trip up students who resort to it illicitly. But police aren’t teachers, and suspects can’t be learners. If, on the other hand, a prof bites the bullet and incorporates AI “responsibly” (for example, by inviting students to ask it questions and then edit or critique the answers) then time once given to teaching the subject of the course must now be spent getting students to use a tech industry product for the sake of evaluating its responses to questions they themselves can barely answer, on the basis of knowledge and skills they do not yet really have and will not develop as well as they might — because they are busy training the machine.

Despite the ritual assertion — popularized by AI salesmen — that AI is “the future,” that it is “everywhere,” and that it therefore “must” be used, it remains a constant reply in the face of criticism that generative AI is, after all, just a “tool.” (Twitter philosophers used to refer to this as a motte-and-bailey.) Talk of AI as “just a tool” is, I think, dishonest. A tool is for something. A solution is a solution to a problem. But it is simply fact that generative AI has not entered the classroom as a tool of instruction because instructors found it useful for a specific task, or because it solved a teaching problem. To the contrary, it has entered classrooms because it has been marketed to people other than teachers, namely, students and administrators, and more and more because administrators have taken it upon themselves to supply students with it in every corner of campus digital life — naturally, leaving faculty to deal with any problems this might pose to academic ethics or standards. In the context of teaching, generative AI is a solution in search of problems, and teachers have been tasked by their non-teaching bosses with finding problems for it to solve (i.e., with finding new markets for AI products) whether or not doing so improves their teaching methods or works toward their teaching goals. Routinely, studies finding that generative AI “tools” make teaching harder, more labour-intensive, and less successful still conclude with suggestions for how teaching must change to fit the “new reality” of generative AI. That is not the adoption of a tool; it is the imposition of a program, from the top down, by non-teachers on people who teach.

In a university context, this program is just one of many manifestations of the collapse of faculty governance. In the “debate” on generative AI, as in so many other debates about the directions in which universities are moving, arguments made by academics on academic grounds are in an oppositional stance from the start even within academic spaces, because their institutions are governed, even in their academic functions, by non-teaching, non-researching senior administrators and by non-academic boards, who have decided on everyone’s behalf that the salesmen are right and AI is the future. Whatever its value for teaching and learning, and whether teachers ultimately take it up or not, the way that generative AI has come to the university reflects a move away from academic oversight of academic work. Generative AI may be the future of undergraduate education, for a while. But that is a decision, not a prophecy. Using bad tools where better ones exist is a choice, not a fate. The rise of OpenAI is not God’s plan unfolding in secular time. But the fact that serious study of AI’s effects on learning has followed rather than preceded its adoption by educational institutions, and that its disastrous implications are taken as an agenda for its further entrenchment rather than grounds for rejecting its use, is certainly a sign.

Leave a comment