AI & Law

How Law Firms Check AI Output for Errors

Lawyers are getting six-figure sanctions for fake AI citations. The check that prevents it takes a few minutes per brief.

Michael Pavlovskyi Michael Pavlovskyi · · 9 min read
A state supreme court and court of appeals building. The sign in front reads Supreme Court, Court of Appeals, State Law Library.
Source: State supreme court and court of appeals law library. Photo by Bace Agency.

Key Takeaways

  • AI tools invent cases that look real. A Stanford Law study found that even purpose-built legal research tools hallucinate in more than one of every six queries.
  • The penalties are real and rising. An Oregon federal judge ordered $110,000 against two lawyers for AI errors in court filings.
  • The fix is not avoiding AI. It is a short verification check: confirm the case exists, confirm it says what you claim, and confirm it is still good law.
  • The check is a workflow, not a tool purchase. Any small firm in Winnetka or Highland Park can run it on every brief before it goes out.

A lawyer asks an AI tool for cases that support an argument. The tool returns three, complete with names, citations, and quoted holdings. They look right. They read right. Two of them do not exist.

That is not a rare failure. It is how these tools work when nobody checks them. The model is built to produce text that sounds correct, not text that is true. For a law firm in Winnetka or Lake Forest, the gap between those two things is the difference between a filed brief and a sanctions order.

The good news: catching the error is not hard. It does not require a new vendor or a six-month project. It requires a verification check that takes a few minutes per brief and a rule that no AI output reaches a client or a court without going through it.

1 in 6
queries where leading legal AI research tools produced a hallucination, per Stanford Law's 2024 study. General chatbots were far worse.
$110K
penalty an Oregon federal judge ordered against two lawyers for AI errors in filings, per the ABA Journal.
$6K
sanction against one attorney who filed briefs citing nonexistent AI-generated cases, per Bloomberg Law.

Why AI Invents Cases That Look Real

A large language model does not look up the law. It predicts the next word based on patterns in its training data. When you ask for a case on a narrow point, the model knows what a real citation looks like: a case name, a reporter volume, a page number, a year, a holding. So it produces one in that exact shape. The format is perfect. The case is fiction.

This is why the fake citations are dangerous. They do not look like obvious errors. They look like the real thing. The Stanford study tested tools sold specifically for legal research, the ones with access to actual case databases, and still found hallucinations in more than 17 percent of queries. The error includes both invented cases and a subtler problem: citing a real case that does not say what the brief claims it says.

The earliest high-profile example set the pattern. In 2023, two New York lawyers filed a brief in Mata v. Avianca built on cases ChatGPT had invented, then doubled down when the court asked them to produce the opinions. The court sanctioned them. Since then the pattern has repeated across the country, and the dollar figures have climbed. The $110,000 Oregon penalty is the current high-water mark, not the floor.

"The model is not lying to you. It does not know the difference between a real case and a plausible one. That is your job, and it always will be."

Michael Pavlovskyi, Bace Agency

Why This Matters More for Small North Shore Firms

A large downtown firm has a research department and a junior associate whose week can absorb a citation check. A three-attorney estate planning firm in Highland Park does not. The partner is the one drafting the brief, running the AI tool, and signing the filing. The same person who saves time with AI is the person who has to catch its mistakes.

That is the trap. The tool feels like a junior associate. It is fast, it is confident, and it hands you finished-looking work. But a junior associate knows when they are unsure. The model does not. It delivers the invented case with exactly the same confidence as the real one. So the time you saved on the first draft has to be partly spent on verification, or you have not saved time at all. You have moved the risk downstream to the moment a judge reads your brief.

None of this is an argument against using AI. Used with a review step, these tools genuinely speed up drafting, research framing, and document review. The point is narrower: the verification step is not optional, and it is cheap. The firms that get burned are the ones that treated AI output as finished work instead of a first draft. We wrote about the broader version of this question in our piece on how AI agents handle data for North Shore firms. The same principle applies here: a human signs off before anything leaves the building.

Check One: Confirm the Case Exists

Before you read a word of the holding, prove the citation is real.

This is the check that would have stopped almost every sanctions case to date. For every case the AI cites, pull it up independently in a real legal database or the court's own docket. Tools like Westlaw, Lexis, or Google Scholar are raw utilities here, not a safeguard. They only catch the error if a person actually runs every citation through them. Do not ask the AI to confirm its own citation. It will tell you the fake case is real, because it cannot tell the difference.

If you cannot find the case in a real database, it does not exist. Delete it. There is no version of this where a case you cannot locate turns out to be fine. The same goes for a citation that points to the wrong volume or a quote you cannot find on the page the AI gave you.

SAMPLE CLAUDE PROMPT

"Here is a draft brief. List every case, statute, and regulation it cites in a simple table: citation, the proposition it is cited for, and the exact quote if one is used. Do not assess whether they are real. Just extract them so I can verify each one independently in a real legal database."

That prompt does not ask the AI to check itself. It asks the AI to do the boring extraction work so a human can run each citation through a real database. That division of labor is the whole idea: let the tool draft and organize, let the lawyer verify.

Check Two: Confirm It Says What You Claim

A real case cited for a holding it does not contain is still a problem.

The invented case is the obvious failure. The quieter one is a real case that the AI has mischaracterized. The citation checks out. The case is real. But it does not actually stand for the point in your brief, or it says the opposite, or the quoted language is stitched together from two different paragraphs.

This is why the second check matters. For every citation that survived check one, open the actual opinion and read the relevant passage. Confirm the holding matches the proposition in your brief. Confirm the quote appears, in context, on the page cited. This is ordinary lawyering. AI did not invent the need for it. AI just made it easier to skip, because the draft already looks polished.

SAMPLE CLAUDE PROMPT

"I am pasting the full text of a real court opinion and the sentence from my brief that cites it. Tell me whether the opinion actually supports that sentence. Quote the specific language that supports or undercuts it. If the opinion does not address the point, say so plainly."

Note what this prompt does. It gives the model the real opinion text, so it is reading an actual document instead of recalling from memory. That is the safer way to use these tools: feed them the source, then ask a narrow question, rather than asking them to produce the source from nothing.

Check Three: Confirm It Is Still Good Law

A case that was good in 2019 may have been overturned since.

The third check is the one AI is worst at. A model trained on older data has no reliable sense of what has happened since. A case it cites may have been reversed, vacated, or limited by a later decision. The AI will not flag that. It does not know.

So the last step is the same one you would run on any citation: check that it is still good law. Run it through a citator. This is not an AI task at all. It is the existing standard of care, applied to citations that happen to have come from a machine. The ABA's duty of competence under Model Rule 1.1 already requires lawyers to understand the tools they use. A judge will not accept "the AI gave it to me" as a defense, any more than they would accept it from a paralegal.

Here is the part most firms miss. Buying a tool or relying on a database does not fix the hallucination risk. The fix is a customized, repeatable workflow built into your firm's daily operations, which is exactly what we design and automate at Bace. That means a fixed checklist, a required review step, and clear lines about which tasks touch privileged files and which do not, set up once in an AI automation engagement instead of living in one partner's memory.

How to Get Started

1

Write the rule down and make it non-negotiable

No AI-assisted citation reaches a court or a client until a named person has run all three checks. Put it in writing. A rule that lives only in one partner's head is the rule that gets skipped on a deadline.

2

Verify in a real database, never in the AI

Pull every case in a real legal database or the court docket. Read the holding. Run the citator. The AI extracts and drafts. The human and the database verify. Do not let the tool grade its own work.

3

Sort your AI tasks by risk before you scale

Drafting a client email is low risk. Filing a brief is not. Map which tasks touch court filings or privileged files and which do not. Start AI where a mistake is cheap, and keep the heaviest review on the work a judge will read.

If you want a sense of where your firm stands before committing to anything, the AI readiness quiz is a five-minute starting point. It will not write your verification policy for you, but it will tell you where the obvious gaps are.

What This Does Not Replace

The verification check protects you from filing fiction. It does not turn AI into a substitute for legal judgment. The model cannot tell you whether an argument is wise, whether a case is the strongest available authority, or whether a filing is the right move for the client. It can draft and organize. It cannot decide.

It also does not replace the citator, the database, or the lawyer's signature. Those exist for a reason, and AI does not retire any of them. Here is the catch most owners miss. Doing all three checks by hand, on every brief, burns the most expensive time in the building: a partner's billable hours. Manual review does not scale. The more you use AI, the more checking it creates, until the time you saved on drafting is gone.

For a small firm, the only reliable way to get the speed of AI without the sanctions risk is to stop running the checks by hand. Build them into a private, automated workflow: the extraction, the verification routing, the review sign-off, and the data rules, set up once and run on every filing. That is the work we do in a consulting engagement. The partner stops being the bottleneck, and nothing reaches a court unverified.

If your firm is already using AI without a written verification step, every brief you file is exposure. The fastest way to close that gap is a free 30-minute AI audit with me, in person on the North Shore or by video. We map your current review process, find where a fake citation could slip through, and lay out the workflow that fixes your internal data and review infrastructure. No pitch, no obligation.

Frequently Asked Questions

Can I just ask the AI to check its own citations? +

No. The model cannot reliably tell the difference between a case it invented and a real one, so it will often confirm a fake citation as genuine. Verification has to happen in an independent legal database or the court's own docket. The AI can extract and organize the citations, but a human and a real database confirm them. The durable fix is a built-in workflow that forces that step on every filing, not a tool you hope someone remembers to use.

Are purpose-built legal AI tools safe to trust without checking? +

No. A Stanford Law study found that tools sold specifically for legal research, with access to real case databases, still hallucinated in more than one of every six queries. They are better than general chatbots, but they are not error-free. The verification check applies to every tool.

What are the actual penalties for filing a fake AI citation? +

They range from a few thousand dollars to six figures, plus reputational damage and possible referral to disciplinary authorities. An Oregon federal judge ordered $110,000 against two lawyers, and other courts have issued sanctions in the low thousands. The trend in 2025 was sharply upward in both frequency and dollar amount.

Does this mean small firms should avoid AI for legal work? +

No. Used with a review step, AI speeds up drafting, research framing, and document review. The risk comes from treating AI output as finished work instead of a first draft. The three-part check is cheap and fast, and it lets a small firm get the speed benefit without the sanctions risk.

How long does the verification check actually take? +

For a typical brief, a few minutes per citation: pull the case, read the relevant passage, run the citator. The first time a firm builds the habit it feels slow. Once it is a fixed step in the workflow, it adds little time and removes the largest single risk of using AI in filings.

Related Articles

About the author

Michael Pavlovskyi

Written by

Michael Pavlovskyi

Founder, Bace Agency

Michael builds custom Claude and GPT workflows for insurance agencies, law firms, and PE firms on Chicago's North Shore. Speaker at Northwestern and Lake Forest College on practical AI adoption for professional services.

Connect on LinkedIn

Want to see how AI fits in your firm?

Book a free 30-minute AI audit. No obligation, no pitch deck.

Book a Free AI Audit →