Best Local LLM for Lawyers: Here's What Actually Works

Best Local LLM for Lawyers: Here's What Actually Works

Every time you paste a client contract into ChatGPT, you’re potentially violating attorney-client privilege. I know that sounds dramatic, but it’s not. That text goes to OpenAI’s servers. It sits there. And depending on your settings and their current privacy policy — which, by the way, changed at least twice in 2025 alone — it may be used to train future models.

For most people, that’s just a privacy concern. For lawyers, it’s a professional conduct issue with real consequences.


The good news is that running AI on your own machine, fully offline, is not that hard anymore. You don’t need a PhD in machine learning or a server rack in your office. By the time you finish reading this, you’ll know exactly what hardware you need, which models to use, and how to have the whole thing running in about 15 minutes.

Why Lawyers Cannot Use Cloud AI for Client Work

So here’s the basic problem. When you use ChatGPT, Claude, or Gemini to analyze a client document, that text leaves your machine. It travels to a cloud server, gets processed, and a response comes back. Simple enough. But what happens to the text in between? That’s where it gets messy.

OpenAI’s terms say your data may be used to improve their models unless you specifically opt out — and even then, they retain it for some period for safety monitoring. Google is similar. Anthropic is a bit better about it, but “a bit better” is not a legal standard.

The ABA Model Rules of Professional Conduct, specifically Rule 1.6, says you have a duty to make reasonable efforts to prevent unauthorized disclosure of client information. Using a cloud AI tool without fully understanding how that tool stores and uses your data? That is probably not a “reasonable effort.” Bar associations in New York, California, and Florida have all issued guidance in 2024 and early 2025 warning lawyers to be careful with AI tools and client data. The New York State Bar’s April 2024 report was pretty blunt about it — don’t put confidential client information into a system you don’t control.

Local AI fixes this completely. The model runs on your machine. Nothing goes anywhere. No API call, no server, no third party.

And look, I get it — the cloud tools are just more convenient. But “convenient” is not a defense when a client files a grievance.

What Local AI Can Actually Do for Legal Work (and What It Can’t)

Let me be honest about this because a lot of articles oversell it.

A good local model — say, Llama 3.3 70B running on decent hardware — can do some genuinely useful things. Contract drafting is one of them. Not full contracts from scratch (you wouldn’t want that anyway), but first drafts of standard clauses, suggestions for language, flagging things that look off. I’ve used it to draft NDA boilerplate and it came back with something about 80% usable, which is honestly not bad.

Summarizing long documents is probably where it shines most. Paste in a 60-page deposition and ask for the key points. You’ll get a clean summary in maybe 10 seconds. Not perfect, but enough to get oriented before you read the full thing. Same with case files, long correspondence chains, and research memos.

Legal research support is trickier. You can paste in a case you’ve already found and ask the model to explain it or pull out the relevant principles. That works fine. What it cannot do is go look things up. There’s no connection to Westlaw, LexisNexis, or any live database. 

And this is the part I really want you to remember: local LLMs hallucinate citations. They will confidently give you a case name, a citation, a holding — and it may be completely made up. I found this out the hard way when I asked a 7B model to list supporting cases for a contract argument. Three of the five cases it gave me either didn’t exist or had holdings that said the opposite of what the model claimed. Always verify. Always.

Other things it handles well: drafting client emails in plain language, proofreading briefs, identifying missing clauses in NDAs, and cleaning up legal writing that’s gotten too dense.

What it can’t do: replace your judgment, access live databases, guarantee accuracy, or tell you what the law says in a jurisdiction it wasn’t trained on well.

What Hardware Does a Lawyer Actually Need?

Good news here — you almost certainly don’t need new hardware just for this. But if you’re buying something, here’s how I’d think about it.

The main thing that matters for running local AI on text tasks is RAM, not GPU as using GPU will be a overkill. Legal work is mostly text — no images, no audio, no video generation. So a strong GPU is basically wasted money for this use case. What you want is enough RAM to hold the model in memory so it can run fast.

A 7B model (the smallest useful size) needs about 6–8GB of RAM to run. A 13B model needs around 12–16GB. A 70B model, which is close to GPT-4 quality, needs roughly 40–48GB. So the RAM requirement goes up fast as you go to bigger models.

At the minimum end, any mini PC with 16GB RAM in the $300–500 range will run 7B and 13B models fine. Beelink and Minisforum both make decent options in that range. The N100 or N305 chips in those machines are slow but they work — you’ll wait maybe 10–20 seconds for responses, which is livable for drafting work. There’s a full breakdown of mini PC options in our Best Mini PC for Local LLM Under $500 guide.

If you want something faster and more comfortable to use daily, 32GB RAM is the sweet spot. Either a 32GB mini PC or the base Mac Mini M4 (which starts at around $699 with 16GB but goes to $999 for 24GB — honestly get the 24GB). Apple Silicon is genuinely excellent for this because of how their unified memory works. The model sits in memory shared between CPU and what acts as a GPU, and inference is fast even without a dedicated graphics card. We’ve done a full Mac Mini M4 vs Mini PC comparison if you want to dig into that.

For the power user — and by this I mean a firm that wants to run a full 70B model that’s actually close to GPT-4 in quality — the Mac Mini M4 Pro with 64GB unified memory does it really well. Around $1,400. That’s not nothing, but it’s also a one-time cost with zero ongoing subscription fees. Compare that to paying $20–40/month per person for cloud AI access and it pays for itself in under two years. More on that in our Best Mini PC for Ollama rundown.

No dedicated GPU needed. I want to say that clearly because I spent two hours confused about this when I first set this up.

Best Local AI Models for Legal Work

There are a lot of models out there. For legal work specifically, here are the ones I’d actually recommend.


Llama 3.1 8B and Llama 3.3 70B are the first thing I’d tell anyone to start with. Meta’s Llama series is widely supported, runs on almost every tool, and the quality — especially the 70B version — is honestly good enough for real legal drafting work. The 8B is fast but misses nuance on complex contracts. The 70B is slower but gets it right much more often. If you’re on a Mac Mini M4 Pro with 64GB, run the 70B. If you’re on a 16GB mini PC, start with the 8B and see if it’s enough.

DeepSeek-R1 is one I started using in January 2025 and it’s become my go-to for anything that needs careful structured reasoning. Things like: “Here’s this contract, walk me through each party’s obligations and where the risk sits.” It does that kind of systematic analysis really well. The quantized 7B version runs on 16GB hardware, the 32B needs 32GB. DeepSeek had some controversy earlier this year about data practices on their cloud service, but running it locally means none of that applies — you’re just using the model weights, no connection to DeepSeek’s servers.

Mistral 7B is the fastest option on limited hardware. If you’re on an older machine with 16GB RAM and you mostly need quick drafts and email rewrites, Mistral is very responsive. It’s not as strong as Llama 3.1 on complex reasoning, but for day-to-day quick tasks it’s good enough and feels snappy.

For all of these, you want the GGUF quantized version. Specifically Q4_K_M is the sweet spot — it reduces the model size significantly while keeping quality close to the original. You download these from Ollama’s model library (or Hugging Face if you want more control). Ollama handles the quantization format automatically, so you don’t really need to think about this if you’re using Ollama.

How to Set It Up in 15 Minutes

This is actually the part that surprised me most when I first tried it. I expected it to be complicated. It’s not.

  1. Go to ollama.com and download the installer for your operating system. There are builds for Mac, Windows, and Linux. Run it. That’s the whole installation.
  2. Open a terminal (on Mac: press Cmd+Space, type “Terminal”, hit enter) and run this:
  3. Run the following
ollama pull llama3.1:8b

4. This downloads the model. It’s about 5GB so it’ll take a few minutes depending on your connection.

Once it’s done, run:

ollama run llama3.1:8b

5. You’ll get a prompt. Type something. It responds. That’s it — you’re running local AI.

If you want a proper chat interface instead of the terminal, install Open WebUI. It gives you a ChatGPT-style browser interface. If you have Docker installed, the command is:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

6. Then open your browser and go to http://localhost:3000. If Docker feels intimidating, honestly just stick with the terminal for now — it works fine.

7. Test it. Paste in a short contract clause and ask it to identify any ambiguities. See how it does.

One thing I got wrong the first time: I tried to run a 13B model on a machine with 16GB RAM and also had Chrome open with about 30 tabs. The model kept crashing because there wasn’t enough free RAM. Close other applications before running bigger models. Sounds obvious but I missed it.

5 Ready-to-Use Prompts for Legal Work

Copy and paste these directly into Ollama or Open WebUI. Replace the bracketed parts with your actual content.

For reviewing an NDA:

“Review this NDA and list any clauses that are unusual, one-sided, or missing entirely. Focus on confidentiality scope, term length, exclusions, and remedies: [paste NDA here]”

For summarizing a deposition:

“Summarize the key points from this deposition in plain English. Note any contradictions or statements that might be significant: [paste deposition text]”

For client communication:

“Draft a professional email to a client explaining the following legal situation in plain, non-technical language. The client has no legal background: [explain the situation]”

For contract analysis:

“Identify the main obligations of each party in this contract, any liability limitations, and any clauses that seem unusual or potentially problematic: [paste contract]”

For brief proofreading:

“Proofread this legal brief. Flag any sentences that are unclear, any arguments that seem underdeveloped, and any language that could be interpreted in multiple ways: [paste brief]”

These prompts work better if you add a line at the start specifying the jurisdiction or the type of matter. Something like “This is a commercial lease dispute in Texas” gives the model context and the responses get noticeably better.

Where to Go From Here

So to recap: cloud AI is a confidentiality risk for legal work, local AI eliminates that risk entirely, the setup is not complicated, and the hardware doesn’t have to be expensive.

The single thing that makes the biggest difference is the amount of RAM in your machine. More RAM = bigger models = better quality output. If you’re buying hardware specifically for this, 32GB is where I’d start, and the Mac Mini M4 is probably the easiest option for a non-technical person who just wants something that works without fiddling.

For next steps on the hardware side, the Best Mini PC for Ollama guide has a full breakdown of what’s worth buying in 2026, including some options that came out in February that are pretty good value.


Post a Comment

Previous Post Next Post