AI Strategy

Why AI Keeps Getting Cheaper for You

Inference costs fell more than 280 times in two years. On-premise hardware just hit desktop price points. The math on waiting is worse than most firms think.

Michael Pavlovskyi Michael Pavlovskyi · · 6 min read
An NVIDIA campus sign with the company's green logo, two people walking past on a tree-lined path
Source: Editorial photo, NVIDIA campus

Key Takeaways

  • AI inference costs fell more than 280 times between 2022 and 2024. The price of running models is no longer a serious constraint for most firms.
  • AI just followed the same path as electricity, broadband, and cloud computing. It is becoming infrastructure. The firms that treat it that way now will compound for years.
  • For the first time, serious AI can run on hardware you own, inside your office, without a cloud account. The on-premise option is real at a price professional services firms can consider.
  • The cost of waiting is not the price of the tool. It is the operational learning your competitors are banking while you research.

AI costs are falling in two directions at once. Cloud inference, the cost of running a model via API, dropped more than 280 times in two years. And this spring, NVIDIA announced hardware that runs capable AI workloads on a machine you own, inside your office, without a cloud account, starting around $1,800. If you are still evaluating whether to start, those two facts just changed the math. Not at the margins. Substantially.

280x
drop in AI inference costs between 2022 and 2024. The rate of decline accelerated further after January 2024.
~$1,800
estimated starting price for NVIDIA RTX Spark laptops this fall. On-premise AI at a price professional firms can consider.
2 years
the compounding workflow advantage firms that started in 2024 now have. Operational knowledge cannot be purchased later.

AI Just Became Infrastructure

Electricity once required a generator on site. Then the grid made it infrastructure. Computing required your own servers. Then the cloud made it infrastructure. Each time a technology crossed that line, the question stopped being "whether" and became "how soon."

AI is in that transition now. From 2020 to 2024, serious AI meant a cloud account and someone else's server. Your data left your building. For any firm with privileged client files, that was a hard constraint. If you want to understand what agentic AI actually means for professional services, we covered what agents do for an advisory practice separately.

NVIDIA's Computex 2026 announcements moved the line. The RTX Spark is a chip going into laptops and compact Windows desktops this fall, running capable AI models locally without a cloud account, with estimated starting prices around $1,800. The DGX Station is a deskside machine that runs models up to a trillion parameters inside your office. Both run on hardware you own. Nothing crosses a third-party server.

The tech conversation focused on speed comparisons to cloud APIs. That is the wrong frame for a law firm or a family office. The relevant question is not which option runs fastest. It is which option keeps your client files inside your building. That question now has a real answer at a price professional firms can consider.

Epoch AI chart showing cumulative global AI compute capacity from 2022 to 2025 growing about 3.4 times per year, broken down by NVIDIA, Google, Amazon, AMD, and Huawei chips
Global AI computing capacity is doubling roughly every 7 months. Source: Epoch AI, cumulative compute capacity tracker.

"For forty years, you launched apps. Click. Type. With RTX Spark, you ask, and the PC does the work."

Jensen Huang, CEO of NVIDIA, Computex 2026

The Price Fell. The Cost of Waiting Did Not.

The standard argument for holding off goes like this: the tools are still improving and prices are still falling, so a better deal is coming. That argument was stronger in 2023. It is mostly wrong now.

Epoch AI's research on LLM inference pricing shows costs declined from roughly $20 per million tokens in late 2022 to about $0.07 by late 2024. The big drop already happened. The incremental savings from waiting another twelve months are real but modest. They are not what is holding any firm back.

What actually costs more every quarter is harder to put on an invoice. Firms that started building AI into their operations in 2024 now have two years of compounded knowledge: which tasks AI handles reliably in their specific context, how to structure the work, where the human review step matters most. Their staff has built the habits. None of that is available for purchase in 2027. You build it by running the work, and firms that waited are starting from zero on a foundation their competitors already poured.

I have seen security concerns kill more AI projects than actual incidents. Waiting for cheaper is the same pattern under a different name. It feels rational while the opportunity cost runs quietly in the background.

What This Means for North Shore Law Firms

For law firms, the hesitation has had a specific shape. ABA Model Rule 1.6 requires reasonable steps to protect client information. Sending privileged matter files to a third-party cloud server raises a question most attorneys would rather not answer. Courts are already fining attorneys for AI missteps, which makes the data question harder to defer.

On-premise hardware settles that by architecture. When the model runs on a machine in your office, the file is processed on site. Nothing crosses an outside server. The short version: the privilege concern disappears not because a vendor promised something, but because the data never moves.

The economics and the data obligation now point in the same direction. North Shore law firms are already adopting AI faster than most expect. Both say start.

SAMPLE CLAUDE PROMPT

"I run a [law firm / insurance agency / family office] on Chicago's North Shore. Here is a list of tasks I want AI to help with: [paste your list]. For each one, tell me whether it touches confidential client data. Sort them into two groups: tasks safe for a cloud AI tool, and tasks that should run on private hardware I control. Explain each call in one sentence."

Three Steps to Get Moving

1

Count the cost of not starting

Pick one task your team does repetitively. Estimate the hours per week, multiply by your billing rate or staff cost, and compare that to what an AI tool costs. For most professional services firms the tool wins by a wide margin. We broke down the real cost of an AI project in detail if you want the full picture.

2

Sort your work into two piles

Not everything needs on-premise hardware. Research, drafting, and general admin often work fine on cloud tools without touching privileged files. Discovery review, matter analysis, and client document work belongs behind your own wall. Client intake is one practical place to start on the cloud side.

3

Build one workflow, not a strategy

Start with the task your team finds most painful. Run it for a month with a human review step before any output goes anywhere. Refine, then expand. AI agents handle the workflow once you map it. Sequence matters more than ambition.

AI is not waiting to become affordable. It already is. The tool is the cheap part. What gets more expensive every quarter is the gap between what your operation could be doing and what it is actually doing. The firms that started early are not ahead because they had better technology. They are ahead because they compounded two years of workflow knowledge that does not transfer and is not for sale.

Frequently Asked Questions

How much have AI costs actually fallen? +

According to Epoch AI research, the cost of running a model at GPT-4 performance fell roughly 40 times per year between 2022 and early 2025, from about $20 per million tokens in late 2022 to about $0.07 by late 2024. The big drop is already behind us.

Do I need to buy on-premise hardware to start? +

No. For work that does not touch privileged client files, a cloud-based tool is practical and cheaper to start with. Sort which tasks touch confidential data first, then decide what infrastructure you need.

What is the right first workflow for a small law firm? +

Discovery review, contract clause extraction, deposition summary, and billing narrative drafts are the most common starting points. AI-powered client intake is another strong first step for firms that want to improve how new matters come in.

How does on-premise AI protect attorney-client privilege? +

When the model runs on hardware inside your office, the file is processed on site and never crosses a third-party server. The privilege concern disappears not because of a data processing agreement, but because the data never moves.

Related Articles

About the author

Michael Pavlovskyi

Written by

Michael Pavlovskyi

Founder, Bace Agency

Michael builds custom Claude and GPT workflows for insurance agencies, law firms, and PE firms on Chicago's North Shore. Speaker at Northwestern and Lake Forest College on practical AI adoption for professional services.

Connect on LinkedIn

Want to see how AI fits in your firm?

Book a free 30-minute AI audit. No obligation, no pitch deck.

Book a Free AI Audit →