AI & Law

What Claude Opus 4.8 Means for a Lake Forest Tax Practice

Anthropic's new flagship model is a modest upgrade with one change that matters to fiduciaries: it is far more willing to admit what it does not know.

Michael Pavlovskyi Michael Pavlovskyi · · 7 min read
Anthropic's Claude Opus 4.8 announcement key art

Key Takeaways

  • Anthropic released Claude Opus 4.8 on May 28, 2026, at the same price as the prior version: $5 per million words in, $25 per million out.
  • The change that matters for a tax practice is not the benchmarks. It is honesty: the model is about four times less likely to let an error in its own work pass unflagged.
  • It reads dense documents (returns, K-1s, IRS notices) more accurately and cheaply, and can hold an entire client file in memory at once.
  • The practical use is operations, not advice: intake, client letters, and practice finances. It extends a small team. It does not replace judgment.

Lake Forest, Ill. May 31, 2026. On May 28, Anthropic released Claude Opus 4.8, its most capable model to date, three days after this was written and forty-one days after the version it replaces. Most model releases are noise for a law firm. This one is mostly an incremental upgrade, with a single exception that is genuinely relevant to anyone who signs returns for a living. The model is now markedly more willing to say it does not know.

That is the thread worth pulling for a tax practice. The rest of this is a short overview of what shipped, why that one change matters to a fiduciary, and three jobs the model can do on the operations side of a firm in Lake Forest or anywhere else.

$0
price increase. Same $5 / $25 per million tokens as Opus 4.7.
4x
less likely to let a flaw in its own work pass unflagged, versus Opus 4.7.
1M
token context window. Large enough to read a full client file at once.

What Changed

Opus 4.8 is Anthropic's top model, above the smaller and cheaper Claude models (Sonnet and Haiku). Firms already using Claude through its website, the desktop tool Claude Cowork, or a vendor integration are being moved onto it automatically over the coming weeks. Four things changed that bear on tax work.

First, honesty. The known failure of these tools, and the reason careful lawyers have kept their distance, is that they sound certain while being wrong. Anthropic's own testing shows Opus 4.8 is roughly four times less likely than the prior version to let a flaw in its work pass without flagging it. In practice it is more likely to write "I cannot confirm this figure, verify it" than to invent one.

Second, document reading. Tax is a document business, and the model is better at dense, messy source material: scanned returns, brokerage statements, a K-1 (the form reporting a partner's share of partnership income), and IRS notices. One enterprise tester reported processing PDFs at 61% lower cost than before, another better "citation precision," meaning it points to the right line rather than one nearby.

Third, memory. A one-million-token context window (a token is about three-quarters of a word) is enough to load a complete client file, prior returns, and correspondence at once, and reason across all of it without losing the start by the end.

Fourth, an effort control on every plan that lets a user set how hard the model works on a task: fast and cheap for reformatting, slow and thorough for analysis.

Claude Opus 4.8 benchmark comparison across coding, agentic, reasoning, and knowledge-work tests, showing gains over Opus 4.7
Opus 4.8's scores against the prior model and competitors. The gains are real but incremental. Source: Anthropic, "Introducing Claude Opus 4.8," May 28 2026.

Pricing did not move: five dollars per million words in, twenty-five out. A more careful model arrived at last month's price. The primary sources, for anyone who wants to check, are Anthropic's announcement and the developer release notes.

Why the Honesty Gain Matters for Tax

The companies that build legal and tax software tested this model before launch and said unusual things on the record. The maker of one legal-research tool called it the highest score it had recorded on its internal benchmark. The chief technology officer of Thomson Reuters, behind the CoCounsel assistant many firms already pay for, singled out "legal and tax professionals" by name.

"As we build fiduciary-grade AI systems for legal and tax professionals, advances like these help raise the standard for trusted AI performance in real-world workflows."

Joel Hron, Chief Technology Officer, Thomson Reuters (CoCounsel)

The blocker for tax was never that the model could not write. It was that it could not be trusted to say "I do not know." A tool that confidently supplies a basis it cannot support (basis being the figure subtracted from a sale price to compute taxable gain) is worse than no tool, because it hides risk behind the appearance of work. The honesty gain is aimed at exactly that. It is the first version where the right answer for a small firm is "yes, carefully," rather than "not yet."

Anthropic chart showing Opus 4.8's misaligned-behavior rate is lower than Opus 4.7 and approaches Claude Mythos Preview
Anthropic's alignment testing: Opus 4.8 behaves more reliably than the prior version. This is the basis for the "more honest" claim, and why a fiduciary can consider it. Source: Anthropic, May 28 2026.

The three uses below are all operations. None asks the model to give advice, take a position, or sign anything. Each takes a task that eats attorney or paralegal hours and hands the routine first pass to the model, with a person reviewing before anything leaves the firm.

1. Intake: Sorting the Document Pile

Turning a client's pile of files into an organized matter, with a missing-items list, before an hour is billed.

Every matter starts with a pile: prior returns, brokerage statements, a K-1 or three, an IRS notice, and a note to "let me know if you need anything else." Finding out what is missing means a person reading every page. With the document gains and the large memory, the model can take the whole pile at once and report what is there and what is not.

SAMPLE CLAUDE PROMPT

"Attached is everything a new client sent for their 2025 return: three prior 1040s, brokerage statements, two K-1s, and an IRS CP2000 notice. Build an intake summary listing every document, the tax year it covers, and key figures. Then give me a missing-items checklist of anything I would normally need that is not here. For any figure you are not confident you read correctly, flag it and name the page to verify. Do not estimate anything you cannot see."

The last two sentences are the point, and they work only because of the honesty change. The valuable output is not the tidy summary; it is the short list of figures the model could not read cleanly and the documents it expected and did not find. Catching a missing K-1 at intake, rather than in October, is the difference that prevents the deadline scramble.

2. Client Letters: Explaining Without Losing the Day

Plain-English client letters drafted in your voice, so you review and sign rather than write from scratch.

A surprising share of a tax attorney's day is explaining: what a notice means, why a refund shrank, the difference between an extension to file and an extension to pay. This writing is real work, hard to bill, and the basis on which clients judge the relationship. It is also what Opus 4.8 now does well, given examples of the firm's own voice.

SAMPLE CLAUDE PROMPT

"Here is a client's IRS CP2000 notice and my one-line note on how we plan to respond. Draft a short letter to the client in plain English: what the notice is, what it means for them, what we will do, and what we need from them. Calm and reassuring, no jargon. Match the voice in the three sample letters attached. Leave every dollar figure and deadline blank for me to fill in, since I verify those myself."

Two guardrails make this safe: feeding the model past letters so the draft sounds like the firm, and instructing it to leave numbers and dates blank because it is poor at being the system of record for a deadline. The attorney reviews, fills in verified figures, and signs. A letter that took twenty-five minutes from a blank page takes five to review. With Cowork's scheduled tasks, a solo practitioner could have weekly client status updates drafted automatically every Friday, ready to skim and send Monday.

3. Practice Finances: The Hire-or-Not Decision

Reading messy billing exports to show which matters make money and whether you can afford another person.

The question that keeps a small-firm owner up at night is whether to hire. The data needed to answer it sits trapped across a practice-management system, a billing export, and a QuickBooks file that do not talk to each other. Two terms decide the answer. Realization is the share of recorded time actually collected (log ten hours, bill eight, collect seven, and realization is 70%). Work in progress, or WIP, is time done but not yet billed. A firm can look busy and quietly starve.

SAMPLE CLAUDE PROMPT

"Attached are exports from my practice-management and billing systems for the last 18 months. Calculate realization rate by client and matter type. Show which matter types are profitable once hours are counted, and which lose money. List work in progress older than 90 days, and flag any client consistently slow to pay. If a number looks wrong or an export is missing data, tell me before calculating. Do not guess."

The output resembles what a fractional finance chief would charge thousands to build, run monthly for the cost of a sandwich. It replaces a feeling with figures: whether the trust and estate work carries the firm while one-off returns lose money, or the reverse. That is the read worth having before raising a fee, adding a person, or letting go of a chronic slow-pay client. It informs the decision; it does not make it, and it is not the books of record.

Where to Start, and What It Will Not Do

The sensible first step is one pilot on a real matter the firm already understands, so the output can be checked: run the intake prompt on a single messy file and judge whether the missing-items list saves the first read. If it does, build a voice file of past letters for the second use, and run the finance pass once after the deadline rush. Keep the toolset small. A law firm is not a software project.

The limits are firm. Opus 4.8 does not give tax advice, decide a position, sign a return, or speak to the IRS, and it should never be allowed to. It is a clerk, not the attorney. It is not the system of record for deadlines or dollar figures; those live in the calendar, the tax software, and the attorney's own verification. The practices that pull ahead this year are not the ones replacing staff with software. They are the ones letting a small, careful team handle more clients by giving the routine first pass to the model and keeping every judgment human.

If you want to work out which of these fits your firm first, book a free 30-minute AI audit: in person in Lake Forest or on video, no obligation, with a one-page plan to show for it.

Frequently Asked Questions

Is it safe to use Claude Opus 4.8 for tax work? +

For operations (organizing documents, drafting letters, summarizing files, first-pass financial analysis), yes, with a human reviewing every output. For giving advice, deciding a position, or filing, no. The honesty gains make the operations use safer than before, but they do not change the rule that a person verifies the work.

What is the single most important change for a law firm? +

Honesty. Opus 4.8 is about four times less likely than the prior version to let an error in its own work pass unflagged. For a field where a confidently wrong answer is malpractice, a model that says "check this" beats one that always sounds certain.

How much does it cost to run? +

Unchanged from the prior version: $5 per million tokens of input and $25 per million of output (a token is about three-quarters of a word). A single client matter typically costs a few dollars to run. The real value is the review time saved.

Will my client data be safe? +

That depends on which version of Claude you use and how it is configured, and it should be settled before any client document goes into any AI tool. Enterprise and team plans offer stronger data-handling terms than a personal account. It is the first thing to nail down in a rollout for a fiduciary practice.

Related Articles

About the author

Michael Pavlovskyi

Written by

Michael Pavlovskyi

Founder, Bace Agency

Michael builds custom Claude and GPT workflows for insurance agencies, law firms, and PE firms on Chicago's North Shore. Speaker at Northwestern and Lake Forest College on practical AI adoption for professional services.

Connect on LinkedIn

Want to see how AI fits in your firm?

Book a free 30-minute AI audit. No obligation, no pitch deck.

Book a Free AI Audit →