AI & Security

Keep Client Data In House With Local AI

The real shift is not a faster chatbot. It is that the model can now run inside your walls, where your files never have to leave.

Michael Pavlovskyi Michael Pavlovskyi · · Updated · 8 min read
Keep Client Data In House With Local AI
Source: NVIDIA keynote
Share:

Key Takeaways

  • The shift that matters is not a smarter chatbot. It is that the model can now run on a machine you own, behind your own lock.
  • You are not buying the model, which is nearly free. You are buying the harness around it: the secure boundary that connects AI to your files while keeping it off the public internet.
  • Critics call desktop AI slow. For a firm, speed is secondary. Absolute data isolation is the point, and that is what these machines deliver.

For two years, the deal with powerful AI was simple and bad. You got a brilliant assistant, and in exchange you shipped your most sensitive files to a server you do not own. For a law firm, an insurance agency, or a family office, that was never a fair trade. It was a reason to wait.

That reason is ending. Not mainly because the models got smarter, though they did. Because the place where the model runs has moved. It can now sit on a machine inside your office, behind your own lock. That single change matters more than any benchmark, and most of the coverage has missed why.

The shift is from chatbots to agents you control

The first wave of AI was a chatbot. You typed a question, it typed an answer, and the whole exchange lived in someone else's browser tab. Useful, but shallow, and entirely outside your control.

The second wave is different in kind, not degree. The model is no longer just a thing that talks. It is the engine inside a system that can remember, look things up, follow steps, and act. The plainest way the field describes this is that an agent is a model plus a harness. The model supplies the reasoning. The harness is everything around it: the memory, the tools, the connection to your files, and the rules about what it may and may not touch.

That distinction is the whole story for a firm. The model is becoming a commodity. Strong open-weight models can be downloaded for nothing. What you cannot download is the harness built around your business. The harness is where your archives live, where your access rules sit, and where the line between inside and outside gets drawn. The model is rented muscle. The harness is the part that is yours.

Once you see it that way, the hardware question changes. You are not buying a desktop to own a model. You are buying it to own the harness, and to keep that harness disconnected from the public internet.

Why "in house" is the whole game

For a lawyer this is not a preference. It is a duty. ABA Model Rule 1.6 requires reasonable steps to protect client information. The moment you paste a client memo into a public tool, you are handing privileged text to a third party and trusting their word about what happens to it next. Many firms resolve that discomfort by banning AI outright, the same fear that kills more AI projects than actual breaches. That protects the files and forfeits the work.

The better answer is to change where the work happens, not whether it happens. If the model runs on a machine in your office, the file is never in transit and never in anyone else's logs. There is no vendor to vet on the question that matters most, because no vendor is in the loop. That is the case for private, local AI, and the same human-in-the-loop discipline shows up in our piece on why AI agents are safer than most owners think.

Keynote diagram showing an AI agent as a model wrapped in prompt, orchestration, context, tools, memory, and a security and governance layer
An agent is a model wrapped in tools, memory, and a security and governance layer. The model is the cheap part. The wrapper is the asset. Source: NVIDIA keynote.

What changed in the hardware

The hardware got good enough to sit on a desk. NVIDIA's DGX Spark is a desktop machine with 128GB of unified memory and about a petaFLOP of AI performance. It holds and runs real models locally, with no cloud account behind it.

Now the honest part, the part the engineering crowd fixates on. On a desk, large models are slow. The Spark's memory bandwidth is about 273 GB/s, and at that bandwidth a 70-billion-parameter model generates under three tokens a second. By data-center standards that is poor, and the comparison to a Mac Studio with far higher bandwidth is not flattering.

But notice who is doing the judging. That criticism comes from people measuring the machine as a benchmark. They ask how many tokens per second it produces, because that is the number that wins arguments among engineers. It is the right question for a data center and the wrong question for a law office.

For the owner of a boutique advisory practice, generation speed is a secondary concern. The primary concern is absolute data isolation. If a local model takes fifteen extra seconds to read a complex contract, but that contract never leaves the building and never touches a third-party server, that is not a defect. It is the price of digital sovereignty, and for the right firm it is cheap. Speed you can buy back later with better hardware. A document that has already left your control you cannot get back at all.

There is a ceiling above the desktop when you need it. NVIDIA's DGX Station steps up to far more memory and compute, enough to run very large models on a single machine you still own.

MachineMemoryWhat it runs
DGX Spark128GB unifiedSmall and midsize models at an interactive pace
DGX StationUp to 748GB coherentUp to roughly 1 trillion parameters

Specs from NVIDIA's public product pages. Price varies and not every firm needs the larger machine. The point is not that one box is enough for everyone. It is that the entire range now exists on your side of the wall.

The NVIDIA Blackwell RTX GPU die, listed with 6144 CUDA cores and 1 petaFLOP FP4 AI performance
The Blackwell RTX GPU inside DGX Spark: data-center silicon, now sitting on a desk you own. Source: NVIDIA keynote.

What you are actually buying

This is where most hardware write-ups stop, and where the real decision begins. You are not investing in the model. The model is the cheap part. You are investing in the harness: a secure internal setup where AI can reach your matter files, your policies, and your decades of institutional memory, while staying cut off from the open internet.

That is harder than downloading a model, and worth far more. It means deciding what the system may read and what it must never see. It means a logged, reviewable boundary between your archives and the outside world. It means the AI knows your business the way a senior associate does, without any of that knowledge ever being exposed to a vendor. The model is interchangeable. The harness, tuned to how your firm actually works, is the asset that compounds. That is the part Bace builds.

Where local AI fits, and where it does not

I want to be honest about this, because the hype skips it. Local AI fits when the data is sensitive and the work is repeatable and bounded. Summarizing discovery. Drafting a first-pass claim response. Pulling key dates out of a stack of contracts. Searching ten years of matter files in plain language. For a Wilmette estate-planning practice, those are short, bounded jobs a desktop handles at a comfortable pace, which is the honest reason the speed ceiling above does not bite here.

It does not fit everything. Work that never touches private data, like general research or marketing copy, is usually cheaper and easier on a public tool. The skill is sorting the two, not pretending one tool wins everywhere.

And there is a real cost to ownership. You buy the machine, you keep it running, and someone builds the harness around it. Most firms do not want to do that alone, which is the whole reason this is a service and not a product.

A simple way to think about the decision

Here is the test I give owners. Take a task you wish AI could handle. Ask one question: would you be comfortable emailing this file to a stranger?

If the answer is no, that task belongs behind your own wall. If the answer is yes, the cloud is fine. Most firms find they have a clear pile on each side, and the sensitive pile is exactly the work they most wanted help with.

You can sort your own list with a quick prompt. Paste this into any AI tool you already use:

SAMPLE PROMPT

"I run a [type of firm] on Chicago's North Shore. Here is a list of tasks I want AI to help with: [paste your list]. For each one, tell me whether it touches confidential client data. Sort them into two groups: tasks safe for a public AI tool, and tasks that should run on a private model I host myself. Explain each call in one sentence."

That gives you a map. The local pile is your starting point for a harness of your own.

What this looks like in practice

Imagine a Lake Forest family office with twenty years of records across taxes, trusts, and investments. They want to ask plain questions of their own files without those files leaving the office. A local model behind a proper harness can read everything and answer, with nothing crossing the internet.

Or imagine a Highland Park insurance agency that processes claims with health details inside them. A private model can draft the first response and flag missing information, and the protected data stays on agency hardware the whole time.

These are hypothetical, not client stories. The point is the pattern. The sensitive work and the AI live in the same building, under your control.

The takeaway

The old rule was: pick powerful AI or pick data control. That rule is breaking, and the firms that understand why are not waiting for it to finish. The model on your desk may be a step slower than the one in the cloud. It is also entirely yours, and so is everything it reads.

The trend is not loud, and it is not hype. It is the quiet, predictable result of two facts meeting: open models are now free, and the hardware to run them privately now fits in a closet. For firms whose whole value is discretion, that combination is not optional for long. It is where this goes.

If you want to know which of your work belongs behind your own wall, and what the harness around it should look like, that is exactly what a short audit answers. A free 30-minute review, in person on the North Shore or on video, ends with a one-page plan your team can act on inside a quarter.

Frequently Asked Questions

Why buy hardware when the AI model is free to download? +

Because the model was never the hard part. The value is the harness around it: the secure boundary, the access rules, and the connection to your own files that lets the model know your business while staying off the public internet. You are buying control, not the model.

Is a desktop AI box fast enough for real work? +

For the bounded tasks a firm runs, like summaries, drafts, and search, yes. For raw speed on the largest models a desktop is limited by memory bandwidth, and big models run slowly. The honest move is to match the task to the machine, not to chase a benchmark you do not need.

Does running AI in house mean my data never leaves the building? +

That is the whole point. When the model runs on a machine you own, the file is processed on site and nothing is sent to an outside vendor. You still secure the machine itself, the same as any server holding client data.

How does a small firm get started without buying hardware first? +

Start by mapping which tasks actually touch confidential data. That list tells you whether an in-house setup is worth it and how large it needs to be. A free audit will sort that with you before you spend a dollar on hardware.

Related Articles

About the author

Michael Pavlovskyi

Written by

Michael Pavlovskyi

Founder, Bace Agency

Michael builds custom Claude and GPT workflows for insurance agencies, law firms, and PE firms on Chicago's North Shore. Speaker at Northwestern and Lake Forest College on practical AI adoption for professional services.

Connect on LinkedIn

Want to see how AI fits in your firm?

Book a free 30-minute AI audit. No obligation, no pitch deck.

Book a Free AI Audit →