18/01/2026

The Filter and the Fire: A Tale of Two Assessments

Introduction


The silence in my 8:00 AM "Business Management" class was deafening. Just forty-eight hours earlier, I had sat in my home office, scrolling through student submissions that were—to put it bluntly—miraculous. The prose was crisp, the data analysis was "McKinsey-tier," and the strategic recommendations were flawless.

But as I stood at the front of the lecture hall and asked a simple question about why a specific A/B testing metric was chosen, thirty-six pairs of eyes suddenly found the floor very interesting.


The "LLM-pocalypse" hadn't just arrived; it had moved in and started redecorating. We were stuck in a game of "Credential Theater," where students outsourced their thinking to an LLM, and I was expected to grade a machine’s homework.

I knew we had to fight fire with fire. But I also knew I didn't have 30 hours a week to grill every student individually. So, we built a two-stage gauntlet: The Filter and the Fire.



Phase One: The Filter (The MCQ)


We started with a "Safety Gate." Before anyone could claim their grade, they had to pass a 30-minute, proctored Multiple Choice screening.

I spent hours crafting "Diagnostic Distractors"—incorrect answers that looked like the right answer if you’d only skimmed the textbook, but screamed "misconception" if you really understood the logic.

The Rule was simple:

  • Score > 60% You’ve proven the basics. You’re safe.

  • Score < 60%: You enter the "Fire."

It seemed efficient. It was scalable. But the moment the first results went out, the "Human Factor" hit back. One student, Sarah, emailed me within minutes. "I’ve never been good at standardized tests," she wrote. "Seeing that I 'failed' the filter felt like being sent to the principal's office. My heart was pounding before I even started the next part."

I realized then that my "efficient filter" had inadvertently created a class where some students felt like "second-class citizens." To them, the oral exam wasn't an opportunity; it was a punishment.


Phase Two: The Fire (The AI Voice)


For those who didn't clear the gate, "The Fire" awaited. It could be a 20-minute conversation with a Voice AI agent in a manner proposed by Panos Ipereirotis, a NYU professor who introduced this low cost system of automated oral exams (Ipereirotis, 2025). How does this work? This is based on his report.

We used a professional, slightly stern voice—let’s call him "The Auditor." The Auditor didn't just ask questions; he listened. When a student gave a vague, hand-wavy answer about "optimizing for engagement," the Auditor would interject: "That’s a bit broad. How exactly would you guard against Goodhart’s Law in that scenario?"

The Results were startling:

  • The Cheat-Proof Wall: You can’t copy-paste your way out of a live conversation. The gap between "I read the slides" and "I understand the material" became a canyon.

  • The Cost: It cost us about $0.42 per student. In the old days, this would have cost me my entire weekend and several pots of coffee.

But then came the feedback. Students described the AI as "intimidating" and "cold." Because the AI didn't nod, smile, or offer a "hmm, interesting," the silence between turns felt like an interrogation. One student said, "The AI shouted at me through its silence."


The Mirror in the Machine


The most humbling moment didn't come from the students, though—it came from the data.

When we looked at the final "Council of LLMs" grading report, one topic was a sea of red: Experimentation. Almost no one—even the students who did well—could explain the nuances of A/B testing.

The "Fire" had acted as a mirror. It wasn't just that the students hadn't learned it; it was that I hadn't taught it well enough. The AI had diagnosed my own teaching gaps with brutal, mathematical precision.

The Lesson Learned


We set out to build a system that was "uncheatable" and "scalable." We succeeded. But we also learned that assessment is more than a measurement—it’s an emotional experience.

Next year, the "Filter" will be a "Warm-up," and the "Fire" will have a friendlier voice. Because while we can use AI to scale the work of a professor, we can’t yet use it to replace the encouragement of one.

We’re fighting fire with fire, but we’re learning how to keep the students from getting burned.


References:

Panos Ipeirotis. (2025). Fighting Fire with Fire: Scalable Personalized Oral Exams with an ElevenLabs Voice AI Agent. Behind-The-Enemy-Lines.com. https://www.behind-the-enemy-lines.com/2025/12/fighting-fire-with-fire-scalable-oral.html


‌Below some documents that facilitate the introduction of this assessment mechanism:

Document 1: Student FAQ

Navigating Your Assessment: The Filter & The Fire

Q: Why can't we just do a traditional take-home exam? A: In the era of Generative AI, take-home exams have become "Credential Theater"—they often measure how well an AI can write, not how much you have learned. We want your grade to reflect your mastery. Oral exams are the gold standard for authentic assessment, and AI allows us to give this personalized experience to everyone.

Q: I scored below 60% on the MCQ. Does this mean I’m failing? A: Absolutely not. Think of the MCQ (The Filter) as a diagnostic tool. If you score below 60%, it simply means the system needs more evidence of your understanding. The Oral Exam (The Fire) is your "Safety Net." It’s a chance to explain your reasoning in your own words and recoup points that might have been lost due to tricky wording in the MCQ.

Q: Is the AI "judging" me during the conversation? A: The Voice Agent is a data-gatherer, not the final judge. It is programmed to probe your logic. The actual grading is done later by a "Council of Models" that reviews the transcript of your conversation. This ensures that one "bad turn" in a conversation doesn't ruin your grade.

Q: What if the AI voice sounds "intimidating" or "mean"? A: We’ve tuned the agent to be professional, but we know it can feel intense. Remember: the AI doesn't have feelings or "moods." If it asks a tough follow-up, it’s because it’s programmed to help you reach the "mastery" level of the rubric. It isn't disappointed in you!

Q: What happens if my Wi-Fi cuts out during the oral exam? A: Don't panic. You are required to record your session (webcam + audio) locally. If there is a technical glitch, you will simply submit your local recording as evidence, and we will ensure it is graded fairly.


Document 2: Preparation Guide

How to Succeed in the AI Oral Defense

The AI Oral Exam isn't about memorizing facts; it’s about defending your decisions. Use these strategies to perform your best.

1. Master the "Filter" (Stage 1)

  • Beware of "Diagnostic Distractors": Our MCQs include answers that look correct if you only have a superficial understanding. When studying, don't just ask "What is the answer?"—ask "Why are the other three options wrong?"

  • Scenario-Based Thinking: Expect questions that start with "Imagine you are a Product Manager at..." Practicing with real-world cases is better than flashcards.

2. Conversing with the Agent (Stage 2)

  • The "One-at-a-Time" Rule: The AI is instructed to ask one question at a time. If you feel it has "stacked" questions, just answer the first part and say, "Could you repeat the second part of that question?" It will comply verbatim.

  • Think Out Loud: Unlike a written test, the process of your thinking matters. If you are stuck, say: "I’m weighing two options here. On one hand, X might happen, but Y is a risk..." This gives the grading council evidence of your critical thinking.

  • Embrace the Pause: The AI is set to wait 10 seconds before prompting you. Don't rush. Take a breath, organize your thoughts, and then speak. Silence is not a failure; it’s reflection.

3. Technical Setup

  • Environment: Find a quiet room. The AI can sometimes be confused by background noise or multiple voices.

  • Audio Quality: Use a headset if possible. If the AI can hear you clearly, it is less likely to interrupt or ask you to repeat yourself.

  • Verification: Ensure your camera is on and your student ID is ready for the "Authentication Agent" at the start of the call.

4. The "Expert" Mindset

  • Treat the AI as a junior colleague who is skeptical of your plan. Your job is to convince them that your choices (in your project or the cases) are grounded in the course principles. Confidence is efficiency.


 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.