AI will destroy the world (through homework)
It’s 2026. Artificial Intelligence exists. Since ~last year, it has been capable of giving a reasonably good answer to pretty much any question I typically ask students in homework assignments.
So far, so good. Yay science!
But, there’s a teeny tiny problem: students also know this.
TL;DR (click me to expand)
Don’t want to read this whole article, but directly want the key points instead? Fear not, I’ve got you covered.
(But if you disagree with any of the points below, I’m afraid you’ll have to read the full article, if only to realize how wrong you are.)
Thinking hard about challenging problems is to intellectual development what physical exercise is to building muscle: a necessary – if not always pleasant – endeavor.
Graded homework used to be a great way to ensure students did exactly this. However, recent progress in AI has turned it into a convenient shortcut to a good grade, requiring very little effort. Which is precisely the objective of students, according to the
Grand Axiom of Student Psychology (GASP): a student wants to get the best possible grade while minimizing effort spent.
This is a challenge for teachers, who pursue their own, somewhat conflicting objective:
Grand Objective of the Astute Lecturer (GOAL): maximize students’ learning while minimizing their own effort.
The “minimizing effort” part is not purely laziness: with 30+ students per teacher, there is only so much time a teacher can dedicate to each student. To make matters worse, accurately estimating whether AI was used for a given question may not be practically feasible, due to a recent update to Brandolini’s law:
Yayanini’s law: proving that a text is AI-generated is not just harder than generating it – it’s orders of magnitude harder.
The challenge is thus to design class rules that incentivize students not to use AI for homework – so that they keep learning – while keeping the teacher’s workload manageable.
Some obvious ideas can be eliminated right away:
- Completely removing homework assignments: this throws away a valuable opportunity to make students think long and hard, which is difficult to recreate in other settings.
- Penalizing detected AI use question by question: this would require far more time than the teacher has available. And the low risk of getting caught on any single question may still make AI use rational from a risk/reward perspective.
A seemingly more reasonable option: give the entire homework a grade of 0 upon any evidence of AI use.
However, this rule may be hard to enforce in practice: caught cheaters have a strong incentive to deny accusations and waste the teacher’s time, in the hope the teacher will eventually give up the fight. A viable set of rules must both deter students from cheating, and ensure that arguing in bad faith remains a losing strategy for caught cheaters.
As a result, this is the deterrence doctrine I’m currently experimenting with:
Evidence of AI use on any part of a homework assignment warrants a grade of 0 for the entire homework.
A repeated offense warrants a failing grade for the entire course.
Accusations of AI use may be challenged by students who believe they were wrongly accused. Doing so will trigger a re-evaluation with a strict application of the official policy.
Whenever possible, such rules should be backed by matching university-level policies, so that sanctions can actually be enforced and deterrence remains credible.
Homework assignments and the imminent collapse of civilization
As it turns out, the objective of the homework assignments I give was never to provide me with the solution since, in principle, I already know it. It’s not even really about evaluating the students either, although assignments are graded in my class. No, the real objective is to ensure students think hard about interesting* problems, and learn cool* things along the way.
(*Yes, interesting and cool indeed. Don’t be too quick to shout “neeerd”, as I gently remind you that you are currently reading a machine learning blog.)
To use a metaphor I’m fond of: working hard on a problem is to intellectual development what physical exercise is to muscle building. A coach may assign a 10km run to a student athlete, who later shows the GPS-tracked route as proof of completion. But if the route was covered by car, the exercise becomes entirely pointless: the goal was never to trace a path on a map.
Similarly, if current students don’t put in intellectual effort anymore, they won’t build intellectual muscles. We will then have an entire generation of intellectual weaklings, our leaders will be (even more) incompetent (than now), and human civilization will collapse.
I might be slightly exaggerating here, but honestly not that much.
So, as a (not so) humble teacher, what can I do to save the world? There are a couple of immediate possibilities I could consider to solve my AI problem:
- Completely remove these now obsolete homework assignments. This would be equivalent to cutting trainees’ physical exercise altogether: see the previous part about the collapse of civilization.
- Keep homework assignments, but stop grading them. Let’s be honest, in practice this would be equivalent to proposition #1. Civilization collapses again.
- Continue grading homework assignments as I used to. Then students continue using AI. Civilization goes boom. 🏛️💥
- Book special slots to have students work on the homework assignments while under my constant scrutiny. Well… while I should be concerned about civilizational collapse, I also have a life outside of work. So I’d rather not do that.
None of these solutions is great. To save the world, we must find a way to ensure students keep thinking hard about cool problems.
The cat-and-mouse problem
There’s an elephant in the (class)room that I didn’t address yet: how do I know students actually use AI?
There are a few bodies of evidence. First, based on the stellar quality of last semester’s average homework: either I happened to have the brightest cohort ever by far, or many students used AI. The average quality of the supervised, on-paper final exam was somewhere around “meh”. This tips the scales towards the latter hypothesis.
In addition, some students are really bad at covering their tracks. Subtle hints of AI use that I’ve seen in submissions include:
- Leaving the “Your Name” placeholder generated by ChatGPT/whatever.ai as-is instead of replacing it with their actual name.
- Keeping AI follow-up questions like “Would you like further clarification?” in the submission. No, I would not.
- Answering hallucinated questions that were never asked (although they were sometimes good questions, so thanks for the suggestions!)
Further evidence for the non-believers in the upcoming AI-pocalypse
You may skip this part and continue reading if you’re already convinced that students’ use of AI for homework is indeed a problem.
I teach graduate-level classes at arguably the best university in France. Even there, many if not most students are willing to take shortcuts to get a good grade if they think they can get away with it. So I think it’s reasonable to assume that this is true for almost all educational institutions, including middle and high schools.
One faint hope one might be tempted to cling to: my students are, after all, graduate students in the field of AI, and should thus have AI skills well above mere mortals’. Let me shatter your dreams right here and now: using AI to cheat on homework doesn’t require any skill more advanced than typing chatgpt.com in a web browser.
So this problem is unfortunately not restricted to my university or field of study. As a matter of fact, similar situations have been reported at or by Harvard, Cornell, Stanford, etc.
So, students’ AI use really is on the rise. All right.
Is it really that bad though? Or, does using AI to do homework actually impede learning?
I’m convinced this is true to some extent at least.
For a start, in my classes, there is an inverse correlation between the “AI-ness” of homework submissions and actual performance on the AI-free final exam. Quite paradoxically, the best final exam results now tend to come from students with the worst homework grades. This used to be the complete opposite.
I know that correlation is not causation, and there may be confounding factors like intrinsic motivation, which influence both homework dedication and final exam result. Still, I believe this counts as suggestive evidence that unchecked AI use actively impairs students’ learning.
Moreover, even though this debate is fairly recent, studies suggesting that this is indeed the case are slowly starting to emerge. A recent study in Nature argues that AI-induced “deskilling” affects not only students, but even established professionals.
So, there you go. Imminent civilizational collapse is coming.
The previous clues were left even though, at the beginning of the semester, I humbly inform my students that I
- have a PhD in AI
- have been working with Transformers –the “T” in ChatGPT– since before GPT1 – itself released 4 years before ChatGPT’s 1st version
- am being paid (handsomely) to advise Fortune 500 companies on the latest AI developments
and that as such, maybe they ought to assume that I’m somewhat competent at detecting AI-generated content.
However, I have a confession to make: even for me, most situations are far from black and white.
There are many cases where I have strong suspicions that AI was used, without necessarily having conclusive evidence. There are cases where I’m virtually certain that AI was used but still don’t have evidence that would hold up in court. And there are many situations where it is practically impossible to tell AI from non-AI.
What about AI detectors? Short answer: they're not reliable.
Everything you need to know is stated in the title of this collapsible section.
You may now resume reading the main article.
…
Still here? Fine, I’ll elaborate.
Existing AI detectors can lead to both false positives and false negatives, as has been documented by a number of studies. But the strongest argument might be that OpenAI, the company behind ChatGPT, has notoriously discontinued their own AI-detection tool because of its poor accuracy.
It’s not that everyone at OpenAI and all the referenced AI-detection companies are incompetent: creating a foolproof AI detector is fundamentally doomed to failure, for reasons outside the scope of this essay.
There are horror stories* of students facing severe academic consequences because of false flags from AI detectors, despite being willing to prove that no malpractice occurred.
*I have no way of knowing how accurate this specific story is, but I have no doubts that similar situations did and will occur.
So, the conclusion is: AI detectors alone should NOT be used as definitive evidence that a text was written by an AI or a human.
So, to sum up: students do tend to use AI for homework. In rare cases, we can get conclusive evidence that AI was used to write at least part of the solution. But in general, AI detection is unfortunately not practically feasible.
Are we cooked?
Incentives and the Grand Axiom of Student Psychology
To find out, I’d like to go on yet another metaphorical journey. Since I live in Paris, journeys typically start with the Métro. It’s not free to operate, so metro tickets were invented.
Great (allegorical) news: we have just been appointed as BOSS (Behavioral Optimization & Sanctions Supervisor) of the metro. We are in charge of ensuring passengers actually buy a ticket, as opposed to hopping on trains as free riders. We could simply ask people nicely to buy a ticket. Although, maybe… well… Finding the limits of such an approach is left as an exercise for the reader.
OK, new brilliantly original idea: we could randomly check passengers’ tickets, and impose fines on those without a valid ticket.
This is where we embark on an expedition into the fabulous world of rational agents and incentives! 🌈👮♂️🥕🪵
Setting incentives like a BOSS
Let’s assume passengers want to minimize their average cost of using public transportation.
For a start, if the fine is simply the ticket price, buying a ticket is actually irrational: cheating will never cost more than being honest, and will cost less whenever one is not caught. So in such a situation, we can expect people to not buy a ticket.
On the existence of honest people, and economists
I am aware that the model above has its limits: for a start, it’s possible that not everyone is fully rational, or thinks in terms of expected cost.
In particular, this framework excludes any moral consideration. In real life, some people will always buy the ticket regardless of any cost/benefit analysis, simply because they believe it’s the right thing to do. Others will never buy a ticket for various reasons. Still, economists often use this type of framework as a useful simplification to reason about how people are likely to behave in different situations.
So, please don’t force me to make this post longer than it already is, and please address all complaints about forgetting (to account for) the existence of honesty to economists instead.
In general, given a ticket price $t_p$, a probability $p$ of getting checked and a fine price $f_p$, it becomes rational to buy the ticket if the corresponding cost is less than the expected cost of cheating, i.e. if
\[t_p < p \cdot f_p\]If we want to increase the number of people paying their ticket, we have 3 levers:
- Decrease the price of the ticket $t_p$. This has the obvious disadvantage of also decreasing revenue, which may not be economically sustainable.
- Increase the probability $p$ of ticket checks. One thing to keep in mind is that this lever also comes at a cost: in extreme cases, stationing a controller between every pair of stations could end up costing more than what the tickets bring in, making the whole operation unviable.
- Increase the amount of the fine $f_p$. In practice, this tends to be the most practical option. With a small nuance though: $p$ and $f_p$ must remain reasonably balanced for the system to be socially acceptable. Saving costs by checking a single passenger per year, i.e. setting $p$ to 0.0001% while increasing $f_p$ to $5,000,000 to compensate would probably receive some backlash.
Right, my girlfriend just reminded me that I’m not the BOSS; I’m merely a teacher trying to save humanity from AI-induced brainrot. Back to reality then.
Introduction (and conclusion) to student psychology
So. The goal of the rational passenger in the previous example was to minimize the (expected) cost of using public transportation. Can we also model students’ goal to predict their behavior?
Behold, for after years of tireless field studies, I have at long last succeeded in distilling the entirety of student psychology into this elegant equation:
Grand Axiom of Student Psychology (GASP):
The objective of a student is to get the best possible grade $G$ while minimizing effort spent $E$.
"This feels like a slight oversimplification. Yolo 67." — a student
*sigh* Fine, let’s make this blog post even longer, after all why not?
So, a few additional comments about this objective.
Firstly. The most immediate remark I’d expect from any of my good students is to notice that it’s a dual objective: we’re trying to maximize/minimize 2 things at the same time, namely grade $G$ and effort $E$. So we should probably specify a trade-off $\lambda$ between the two and write our objective as e.g.
\[\text{maximize}~~ G - \lambda E\]Another way to look at this would be to state that given a maximum effort $E_{max}$ that a student is willing to dedicate to the class, they would like to maximize their grade:
\[\text{maximize}~~ G \quad \text{such that}~~ E \leq E_{max}\]Or, that given a minimal grade $G_{min}$ that they would like to obtain (e.g. the one enabling them to pass the class or graduate), they would like to minimize their effort:
\[\text{minimize}~~ E \quad \text{such that}~~ G \geq G_{min}\]All of these formulations are equivalent given the right choice of $\lambda$, $E_{max}$ or $G_{min}$ (see this post on constrained optimization if it’s not clear why). So if you agree with any of them, you agree with me. And if you don’t agree with any of them, i.e. you disagree with me, then you’re wrong – cf. disclaimer in the footer of the blog.
In addition, this trade-off varies from student to student, and is arguably one of the main parameters explaining grade variance among students, along with initial familiarity with adjacent material and innate abilities. So explicitly estimating this trade-off is not what I’m after here; I’m mostly interested in the general idea, and will thus omit the $\lambda$.
Secondly. A much more minor remark could be that this might indeed be an oversimplification of student psychology. For instance, it doesn’t account for the possibility that some students are interested in learning for its own sake, not just as a proxy to get good grades. However, I believe the GASP hypothesis above is a better heuristic to predict students’ behavior than relying on an entirely hypothetical intrinsic motivation to learn.
At least, my hypothesis explains pretty well why most students used to do graded homework assignments, used to not do ungraded ones, and started using AI once this became possible.
Finally. As before, I invite you to address any remaining concerns to economists.
Under the GASP assumption above, using AI to do homework is entirely rational, as it enables students to both get a better grade $G$ and spend less effort $E$. And we shouldn’t put the blame entirely on this generation: as a notorious procrastinator myself, I cannot guarantee that I would never have been tempted to trade a dull-but-imminently-due essay for a few hours of video games.
But, as we’ve seen, this still leads to the collapse of civilization. Which is generally considered to be a bad thing, so we still ought to do something about this.
The teacher’s side of the equation
OK, not to make this about me, but: what about MY objective? Or more abstractly, what about The Teacher’s objective, for my entire being is merely reduced to a synecdochic incarnation of the whole teaching body in this essay.
Let me try to offer yet another brilliant distillation of human psychology:
Grand Objective of the Astute Lecturer (GOAL):
The objective of the teacher is to maximize students’ learning $L$ while minimizing their own effort $E$.
Note: in this essay, we focus mostly on the “make students think instead of using AI” component of the learning objective, although there are obviously many other aspects to it.
"Are you implying that I'm lazy?" — a teacher
Firstly. A more charitable phrasing of “while minimizing their own effort” could have been “while keeping their own involvement under a manageable threshold $E_{max}$” – as I hope everyone agrees that even teachers only have a finite amount of time to dedicate to their job. E.g.:
\[\text{maximize}~~ L \quad \text{such that}~~ E \leq E_{max}\]But as discussed earlier, given the right trade-off $\lambda$ or threshold $E_{max}$, these 2 formulations are functionally equivalent.
Secondly. We could waste time making the same remarks as for the previous students objective: teachers may have objectives other than making students learn.
Worse, though, is that some of these objectives may involve feeding their own family. Which may involve keeping their job, which may involve not going to war with the administration, who themselves may have yet more objectives such as e.g. keeping the university funded. Which may involve not failing half the students, which may involve not penalizing cheaters too harshly. So some of these objectives may actually be opposite to our stated GOAL, and it can get pretty complicated.
However, I believe a functional society should have rules that incentivize individuals to maximize the common good, which here would mean that one of society’s goals should be that teachers’ goal is to make students learn as much as possible.
But crafting rules for the entirety of society is slightly too ambitious for this blog post. So I will refrain from discussing meta-incentives further and try to focus on classroom rules instead.
Finally: talk to the hand economists.
There is one problem: the students’ objective is somehow conflicting with my own GOAL, as learning typically requires effort. Which they’d rather not put in.
However, as the teacher, I have one ace up my sleeve: I set the rules in my class.
Thus, I should define rules such that while trying to achieve their objective, students will actually achieve MY goal. That is, I should set incentives such that when trying to maximize their grade while minimizing effort, students will learn as much as possible – while also keeping my involvement at a manageable level.
Rules and laws
For now, let’s focus on incentivizing students to not use AI for their homework – in order to increase learning $L$ in case you haven’t been listening. We can capitalize on our past experience as BOSS: we still have 3 levers to increase the incentive to be honest.
-
Decrease initial price. In our GOAL example, this would be equivalent to decreasing the difficulty of homework. Not an option.
-
Increase probability of getting caught when cheating. As we’ve discussed, getting proof that AI was used for any specific question is often not feasible. However, we can sometimes get proof that AI was used for at least one question, in the same way that one failed ticket check conclusively proves that some fraud occurred. This way, even though the probability of getting (indisputably) caught for any individual question remains low, the probability of getting caught at some point becomes non-negligible.
-
Increase the cheating penalty. One implicit justification for the fine is that if we have been caught cheating for one ride, we’ve probably been cheating for more rides. And we should make up for that. Plus, as we’ve seen, small fines make it irrational not to cheat. To deter effectively, the cost of the penalty must thus be (much) higher than the cost of being honest.
As a direct consequence of points 2 and 3, I propose:
A first rule:
Evidence of AI use on any part of a homework assignment warrants a grade of 0 for the entire homework.
"That's too harsh! Skibidi." — a student again
Some students sometimes object to this rule for being too harsh – typically, right after getting caught.
From experience, such students will demand to only be penalized for questions with overwhelming evidence of AI use, swear they did not cheat on the more ambiguous ones, and argue that the penalty is totally unreasonable.
Exactly like caught fare-dodgers will swear this is the first time in their life they forgot to buy a ticket, offer to simply buy one now, and find the fine totally unreasonable.
This is to be expected, and is not a reason to comply: as discussed, asking a fare-dodger to simply buy a ticket is totally ineffective as a deterrent.
One important comment: while devising this rule, we mostly focused on the 1st component $L$ of the GOAL. But the 2nd component – keeping the teacher’s workload reasonable – is just as critical for our solution to be viable.
In particular, there is an important asymmetry that must be taken into account: students and teachers are not on equal footing when it comes to the effort component of their respective objectives. Before you call me lazy (again): let me explain.
Up until not so long ago, crafting a solution to a homework problem took way longer than grading it. This is why we can have classes of 30 students for 1 professor. However, conclusively proving that or even correctly estimating if a solution is AI-generated may now take far longer than generating it. This is obviously an issue: reversing the previous ratio and having 30 professors for 1 student is not realistic.
This is analogous to an idea called Brandolini’s law or the bullshit asymmetry principle: stating something false takes seconds. Proving that it is false can take hours, days or weeks. There is a severe asymmetry between liars and debunkers, or more generally between bad faith actors and good faith actors.
To account for the latest developments in the field of AI, I hereby suggest amending Brandolini’s law with:
Yayanini’s law:
Proving that a text is AI-generated is not just harder than generating it – it’s orders of magnitude harder.
Thus, it is paramount that the teacher-side effort to detect and penalize AI content remains tightly contained, lest the effort required becomes unsustainable, the probability of getting caught and penalized plummets, and civilization collapses.
Good news, for once: even taking this into account, the proposed first rule still holds up from a game-theoretic perspective.
To recap: the proposed class policy acts as a strong deterrent against AI use by making it unattractive from a risk-reward analysis (significant risk of getting caught, high cost to pay if so). This results in students hopefully doing homework the good old-fashioned way, namely, using their brain. Consequently, students learn more: $L \nearrow$. And, should the need arise, identifying a single piece of evidence of AI use should remain manageable from a teacher’s effort perspective. Consequently, $E \leq E_{max}$.
Therefore, this seems to be aligned with both components of the GOAL.
Now, this begs the question: did I really need to write all this simply to justify why I have a strict no-AI rule in my class?
Well, first, very few of the solutions that I have seen proposed actually manage to handle both the $L$ and the $E$ components of the GOAL. So getting there wasn’t that straightforward. But as it turns out, there’s still an insidious problem left. So this essay is not entirely finished yet. Sorry.
Dura lex sed lex, but sadly not really
As the person in command, the teacher is themself subject to a different first rule:
First rule of command:
Never give an order you know won’t be obeyed.
Or, to be more specific to our setting: never make a threat you can’t follow through on.
Here’s the thing: I can promise you from experience that whatever your initial warnings and however harsh the promised sanctions, at least some students will think they are smart enough to cheat and get away with it.
So, in theory, the previous rule deters students from using AI; in practice, some of them will still try their luck. And we’re now forced to consider a scenario that wasn’t even supposed to happen!
What went wrong with our model?
We made sure that the expected cost of cheating is higher than the cost of being honest. Are cheating students irrational? Was our GASP hypothesis wrong?
Well, respectively: 1. not necessarily (although I do wonder sometimes), and 2. certainly not.
In short: there is an actual probability of getting caught when cheating. And there is a perceived probability, which may differ. If the perceived probability is erroneously estimated to be lower than the actual probability, cheating might still look like the rational choice.
As it turns out, people in general are pretty bad at estimating probabilities (a post about that should be coming soon). And I’ve been told that students are supposedly people.
By the way, one alternative strategy could be to increase deterrence by focusing on perceived probability, i.e. making students believe they will get caught if they cheat – even if that’s not necessarily the case. At least, this would be a sound strategy for someone who is not blogging about it. For my part, I’ll stick to the standard strategy to increase the perceived probability, which is to increase the actual probability.
Finally, let’s not be too proud of how much better we are than these students who fancied themselves smart enough to cheat. Because to be honest, maybe some of them are. We only know of students whose risk estimate was off: from our point of view, all students whom we know cheated were clumsy enough to get caught. But this is survivorship bias. We simply don’t know about the ones who cheated and did not get caught. From all we know, this could very well be all the other students – although I certainly hope not.
Why is this an issue? Well, we’ve seen that the effort component of the students’ and teachers’ conflicting objectives is heavily asymmetric and tipping against the teacher, because 1. there are more students than teachers and 2. proving misconduct may take longer than misconducting. Yet, even facing the tyranny of Yayanini’s law (yes, that’s a thing now), we managed to keep the required teacher-side effort under control.
Until now. Because when a student is caught, this dual effort asymmetry is compounded by yet another asymmetry, this time in terms of personal stakes.
For the teacher, ensuring that an undeserving student fails the class is worthwhile, but not nearly as immediately and personally consequential as for the affected student, for whom the impact of crossing the pass/fail boundary is enormous. As a result, failed students have a huge incentive to spend time disputing accusations and challenging the validity of the (agreed-upon) rules, even in bad faith, if it has a chance to improve their grade.
Wasting the teacher’s time is not simply collateral damage here. It may be the entire point of this deliberate attrition war, based on the hope that the enemy (a.k.a. me in this instance) will eventually give up due to lack of energy.
It gets worse (of course it does).
As if the situation was not tricky enough already, this problem is further compounded by additional factors:
- The time required to handle disputes is evidently multiplied by the number of affected students: more disgruntled cheaters, more time wasted.
- Cheating is not necessarily binary: there are degrees to which students cheat. From personal experience, many students will cheat at least a little. Fighting 10 hours to fail 1 student whose submissions are 100% AI-generated is one thing – fighting 10 hours each to fail 20 students who cheated on 10% of the homework is another.
- Cheaters who have nothing to lose may try to drag the school administration into their war, particularly if they represent a non-negligible fraction of the class. This (infuriatingly effective) strategy may result in further time wasted having to justify decisions. Again, I speak from experience.
It gets even better though – and by “better”, I mean “significantly worse” of course. There are also psychological factors which may serve as tempting excuses to end the war.
- The cost of errors in AI detection is highly asymmetric: as a human with some degree of empathy, I tend to consider that unfairly punishing an honest student is worse than being somewhat lenient toward a cheater.
- There may oftentimes be uncertainty remaining regarding the reality of AI use. This can make it tempting to convince oneself that we end up passing undeserving students because we cannot be absolutely sure they cheated, while the real reason was actually to minimize effort.
If enforcing the promised sanction becomes too costly, the teacher may decide it’s simply not worth it, and end up turning a blind eye. Just this one time. Pinky promise.
Which is of course bad for a number of reasons. In particular: knowing there is a reasonable chance of getting lenient treatment by being sufficiently annoying obviously incentivizes future cheaters to do just that. This further increases the time the teacher needs to dedicate to the issue, further increasing the future likelihood of giving up. Which in turn lowers the expected cost of cheating, encourages more students to cheat, and so on.
As a consequence: civilization 💥.
So, detecting and punishing cheaters may be initially cheap, but handling the consequences may not be. Which makes our meticulously crafted previous rule moot.
Qui vis pacem para nuclear bellum
Skirmishes and nuclear war
I’ll let you in on a secret: in practice, here is what I do. If there is significant evidence of at least some AI-generated content, I grade based on my own estimate of what is AI and what is not. For example, the ~50% of the homework I estimate is AI-generated gets a 0, and the rest I grade normally albeit a bit more harshly than usual. In particular, all AI-suspected questions are marked as “AI-generated” by default.
To my current students: a friendly threat note
Dear Student,
You somehow managed to learn one of my secrets. Well done.
However, allow me to offer a friendly warning: it would be unwise to assume that I will continue to use this exact grading scheme. Nothing requires me to do so.
I may change my mind at any time. And when I do, I may not feel any particular need to update my blog. Should that happen, I am afraid there will be precisely nothing you can do about it.
Your safest bet is to keep assuming that a single trace of AI will earn you a nice, round zero.
With pedagogical affection,
Your Teacher
One advantage is that I don’t need to justify each deduction individually, or prove AI use for each question, which as we’ve seen isn’t feasible. So the effort component of the GOAL is safe. There is also reasonable incentive for the students to not use AI, as it is still likely to result in a bad grade.
However, in these aspects, it’s not really better than the previous rule. It’s even a bit worse in terms of incentives.
There is a catch, though. If there is any dispute about the grade, particularly about the extent of AI use, I roll back to the previous strict policy: mere suspicion of AI use is not penalized, but one indisputably proven AI sentence results in an automatic 0.
The whole idea is that arguing in bad faith is no longer a winning strategy for caught cheaters. On the contrary, it is likely to make things actively worse for them.
What about students who used AI for 100% of the assignment? (spoiler: new rule)
It is true that cheaters who, by my estimate, did 100% of an assignment with AI continue to receive a 0. As such, they may still be tempted to be a pain in the nether regions.
The same idea as before applies: ensure that even these students have something to lose if they try to push it. So that they hopefully don’t.
This can get tricky though: enabling grades of less than zero would be handing them the stick to beat you with if they decide to escalate to the school administration. A rule stating that one conclusively proven AI sentence warrants failing the entire course, as opposed to a single assignment, might seem more defensible. But they could call your bluff: why didn’t you fail all students with traces of AI in their submission? And why is there a difference in treatment if they’re the only ones affected? We’re back to square one.
Instead, I can suggest this fair middle ground: a repeated offense – i.e. more than one homework assignment positively containing AI content – warrants a failing grade for the entire course. This seems reasonable enough to be accepted by everyone, school administration included. And it ensures you have enough leverage to deal with any given student acting in bad faith at most once.
Anyway, from my experience, students who do all homework with AI tend to have such abysmal results on the supervised final exam that convincing anyone why they should fail the class is not really difficult. One alternative strategy I’ve used in the past for such students is to offer to skip the whole what-is-AI-and-what-is-not debate: that is, to base the course grade solely on what can confidently be attributed to them, namely the final exam. In practice, this has the same effect on the pass/fail decision anyway.
Here is a summary of the full updated policy:
Updated nuclear doctrine:
Evidence of AI use on any part of a homework assignment warrants a grade of 0 for the entire homework.
A repeated offense warrants a failing grade for the entire course.
Accusations of AI use may be challenged by students who believe they were wrongly accused. Doing so will trigger a re-evaluation with a strict application of the official policy.
I believe this set of rules is both reasonably fair and effective.
Incidentally, the fine print of the policy may not even need to be official. So far, no one has complained that I was too generous with their grade. But you never know.
The main advantage of this approach is that I don’t need to give a 0 to all students who cheated “just a little” in order to maintain a credible threat – which as discussed would be hard to enforce in practice. I have more gradual responses than a full-on nuclear war at my disposal: similarly to real life, my nuclear arsenal exists to hopefully never be used. However, if provoked, I still can order a swift and decisive strike, which is essential for deterrence to be effective.
As a reminder for international students and in all seriousness, French nuclear doctrine provides for the possibility of a limited nuclear strike as a warning shot. You’ve been forewarned.
Amendments, case law and the Geneva Convention
So, my deterrence metaphors went from fines in the metro to nuclear Armageddon. Boy, that escalated quickly.
Fortunately, we’re pretty much done. Before I conclude, here are just a few clarifications I would like to emphasize.
The applicable policy must be clearly announced.
Quite obviously, students need to know the rules from the start and accept them, even implicitly – taking the class may count as acceptance. After all, a deterrent is only effective if people know about it.
Having the rules plainly written also makes enforcing sanctions easier should the need arise. Trust me, I’ve been there.
Dispute of AI accusations must be possible.
As far as I know, I’ve never wrongly accused anyone of cheating: so far, a grand total of zero students I accused have claimed to be fully innocent. Even then, since no one is 100% infallible, I guess we should contemplate the possibility that I am no exception.
Thus, the possibility of appeal is essential. We just need to ensure that only unfair accusations are worth challenging, in order not to waste everyone’s time – and by everyone’s, I mean mine. I believe the proposed system is already fairly efficient in this respect.
Yeah, but what if there is an appeal though?
Right: what happens if there is an appeal? To avoid any unfair outcome or the re-emergence of misaligned incentives, it is imperative that the appeal determines whether AI was used as accurately as possible. As a result, it may be worth dedicating significantly more resources than usual. If the proposed policy is effective, this shouldn’t happen too often anyway.
If the evidence of AI is deemed already strong: I suggest asking for a second opinion from a neutral third party, briefed on the fact that one strong piece of evidence is all that’s needed.
If there was doubt from the start or the third party cannot conclusively estimate whether the submission contains AI, I suggest organizing an oral examination. From experience, if a student did not write their submission themself, it is fairly easy to poke holes in their solution by asking them to clarify certain concepts that I suspect they may not be familiar with. For example, what do they mean by suggesting to use Platt calibration? Or if they proved that a function was a kernel using Mercer’s theorem, can they briefly explain what this theorem states?
The specifics will of course vary by subject, but the general idea should hold.
Ideally, the student should have little notice about this oral exam. After all, they should already be familiar with the material they allegedly produced. To ensure the oral defense is as unbiased as possible, it can also be conducted by or in the presence of another teacher.
Although establishing guilt is the responsibility of the accuser (a.k.a. me), the student should also be given an opportunity to present evidence in their favor: for instance any draft, notes, or intermediate version worth considering, provided it couldn’t easily have been forged after the accusation.
If, after this due diligence, the use of at least some AI is established: the official policy is applied, resulting in a grade of 0, and possibly further sanctions.
If not: well, there may have been a mistake after all. This is a great opportunity to learn what went wrong with our assumptions, and to update our AI detection skills accordingly.
(But to reiterate, I’ve never been in this situation thus far.)
The proposed policy can be most effectively applied if there is at least one strong piece of evidence.
In other words, accumulating weak evidence of cheating may not be sufficient.
This is for two reasons: first, not having conclusive evidence would risk getting dragged into challenges one may not win, thus undermining the perceived risk of cheating. Besides, not having conclusive evidence would also risk unfairly penalizing an honest student for cheating, something I – surprisingly – do not wish to do.
Any use of AI for legitimate reasons must be strictly disclosed.
There may be cases where using AI is legitimate and does not hinder learning given the context of the class.
One example from my class would be plotting graphs with Matplotlib, which can be notoriously tedious, and doesn’t teach you much to be honest. Time saved on this menial task may be better spent on learning more relevant concepts.
Depending on specific class policy, partial AI use may thus be occasionally tolerated, provided that its scope is explicitly disclosed. Failure to disclose AI use and its exact scope must be interpreted as cheating. This measure is necessary to avoid leaving the door open to post-hoc excuses.
(As a token of good faith, I would like to disclose that I did use AI to proofread this article).
However, allowing even some AI use may also be a double-edged sword. In particular, I would not advise allowing AI for formatting or rewording in general, since it would make it basically impossible to separate its use for form from its use for substance – and thus to prove its illicit use for the latter.
On-paper exams should be included whenever possible.
This one should go without saying.
Even though they are arguably out of the scope of this essay, I believe having supervised, on-paper examinations is more important now than ever, for several reasons:
-
First and quite obviously, this remains the only reliable way to measure student performance. Ensuring students meet a minimum ability threshold (different from the ability to coast through thanks to their AI buddy) is essential if we want diplomas to mean anything.
I’m not planning to stop bragging about my PhD anytime soon, so please make it retain its value. (Did I mention that I have a PhD?)
-
Since cheating at a supervised exam remains much more difficult, one of the easiest ways to obtain a passing grade is to actually learn the material that will be needed for the exam. Which, as a reminder, is what we would like the students to do as per the GOAL.
-
As a bonus, exams are useful to “recalibrate” our internal AI detectors: if a student consistently turns in perfect assignments that don’t look like AI, but dramatically blunders the final exam… it’s possible something fishy was going on after all. And vice versa. This is an opportunity for the teacher to find out what went wrong and learn from it.
If accusations of AI use have not been formally challenged, the evidence that led to the accusation need not be provided.
This hopefully should not discourage honest students from speaking up if they believe there has been a mistake.
But not knowing the extent of the evidence makes it difficult for dishonest students to estimate their chance of winning a bad faith argument with high stakes, ideally discouraging it. In a way, this is the fog of war working in our favor for once.
Also, the last thing I want to do is to help students get better at cheating by “learning from their mistakes”. I’m the only one allowed to learn from mistakes related to this topic.
The teacher must retain full liberty to apply the full range of sanctions if (even limited) use of AI is formally established.
This final one is probably the most important.
I won’t rehash why this matters from an incentive and effort perspective. I would however like to emphasize that unconditional support from the university is critical. Having to fight the administration to fail a student for (proven) AI use would undermine everything we’ve been trying to build here.
In particular, I believe that strict, official rules regarding academic integrity which explicitly cover AI content would be a valuable addition to existing university policies. Most American universities already had such rules about plagiarism when I was studying there: instances of proven plagiarism could go as far as getting you expelled. These rules had to be unconditionally accepted by all students. I think this idea should be extended to unauthorized AI use and rolled out to more universities or schools in general, including in France.
[Update: some universities such as Oxford or Stanford have started implementing such a policy.]
It is true that such rules may sometimes be genuinely painful to enforce: there is no denying that failing 20% of a cohort is extremely unpleasant for everyone involved. However, I believe the alternative (🏛️💥) is even worse.
Final dispatch
Many of the ideas from this essay were introduced with my own classes in mind, but should be applicable much more broadly.
Some adjustments to different classes are straightforward: you may e.g. easily replace “answer to a homework question” with “part of an essay” depending on the context. Other transpositions may not be so seamless, particularly everything related to AI detection. Here, I’m afraid I have significant advantages over almost all other contexts: I have a directly relevant background, and detecting AI in code is arguably much easier than in other submission formats. Unfortunately, I don’t yet have a universal solution for AI detection and sanction adaptation.
I'm working on it though
I can actually almost pretend that I’m making progress: I am currently experimenting with possible “AI traps” in future homework assignments. The idea is to design questions so that an AI, while answering, generates subtle telltale patterns, constituting irrefutable proof of AI use if found. This is still a work in progress for now, but if it bears fruit, I will update this post accordingly. Stay tuned!
Finally, please bear in mind that I am not saying that AI has no place at all at work or in universities, or that we should ban computers altogether and write with quills and parchment instead.
On the contrary, AI can be a fantastic tool for learning, allowing self-driven students to benefit from individual tutoring, tailored explanations and detailed feedback
–
things that traditional teachers unfortunately can’t offer to everyone, through no fault of their own.
I also believe that all students humans would benefit from at least a basic understanding of how AI works and how to use it effectively.
However, there is a time and place for everything. Copy-paste is a tremendously useful invention – but using it to turn in a Wikipedia article as-is instead of an expected thorough reflection on a given topic is dishonest and educationally harmful. The same goes for AI.
Incidentally, I also see problems related to uncritical reliance on AI at work. Some people do use it fruitfully, while others use it to avoid a week’s work by presenting slides visibly copy-pasted from ChatGPT, with no critical thinking whatsoever. It might be argued that, since the latter group brings about as much value as a ChatGPT subscription but typically costs way more, there are big savings just waiting to be made.
Therefore, training students to belong to this group is not helping them.
Beyond that, keeping the ability to think, write, and perform other intellectual tasks by oneself is valuable in its own right. As a case in point, writing this article by hand has greatly helped me clarify my thoughts on this matter, something that would not have happened had I delegated the task to an AI.
So, let’s keep on thinking for ourselves!