You have passed through the hardest part. Your GPU is handed over to Proxmox. It sits there, waiting. Idle. Ready to infer Llama 3 at scale.
But here is where most homelabbers stumble.
For previous articles on the series, visit: Self Host Series
They built the hardware correctly. They passed through the GPU without causing any damage. Then they stand at the crossroads with three paths and zero idea which one will not collapse at 3 AM.

The question sounds simple: where do I run Ollama? But it is not about where. It is about isolation versus speed, about following the crowd versus optimising for your specific constraints. It is about making a decision today that does not haunt you in six months.
This guide walks through the messy reality of these three paths. Not the marketing versions. Not the “in theory” versions. The actual, practical, wear-it-out versions that matter when your model inference hangs, and you need to know where to look.
Path One: The Virtual Machine. Full Isolation, Full Cost.
A Proxmox VM using KVM is a clean concept. You spin up an entire operating system in a sandbox. Ubuntu runs its own kernel. Your GPU appears as a native device. If Llama 3 decides to commit a kernel panic, your Proxmox host does not care. It keeps running. It keeps serving DNS. It keeps backing up your other workloads.
This isolation is not free. Every VM needs 1 to 2 gigabytes just to boot. Multiply that by three instances, and you have lost 4 to 6 gigabytes compared to what LXC could do. Your RAM is gone before your model loads.
CPU overhead hits hard, too. KVM adds 15 to 25 per cent CPU penalty on pure compute tasks. But here is the secret that changes everything: when your GPU does 95 per cent of the inference work, that CPU overhead becomes invisible. You are not paying it where it matters.
When to Choose the VM
You choose the VM when you want production-grade isolation. When you want to experiment without fear. When you have enough RAM that the overhead does not sting. When you want Proxmox snapshots that let you revert bad decisions in seconds.
You choose the VM when you are not that experienced yet. Because, despite what optimisation guides say, simplicity is the real feature. You pass through the GPU once in the UI. It works. You move on.
The Honest Pain Points
Disk space gets consumed fast. VMs require full filesystem snapshots. Start and stop times crawl (30 to 60 seconds, no exaggeration). You are managing both a guest operating system and a hypervisor layer. That is more things that can break.
But the biggest pain point is the one nobody talks about: you are now responsible for managing an entire OS inside Proxmox. That means security patches. That means updates. That means when something goes wrong in the guest OS, you have to SSH in and troubleshoot like you are managing a regular server. Because you are.
You are just managing it inside a container.
Path Two: LXC Containers. Speed for the Impatient.
LXC does not emulate hardware. It compartmentalises one Linux kernel into isolated userspace environments. Think of it as a lightweight chroot with teeth.
All containers share the host’s kernel, but each sees its own filesystem, its own network, its own process tree. No hypervisor overhead. No second OS to manage. Just the kernel doing kernel things, very efficiently.
For pure CPU workload, LXC is measurably faster. Less than 5 per cent performance loss compared to native execution. On AI inference, where the GPU does the heavy lifting, the difference disappears entirely. But on a tight 8GB system, every megabyte of RAM saved is RAM available for your model’s context window. On constrained hardware, this matters.
Boot time is nearly instant. 1 to 2 seconds. A container costs about 0.5 gigabytes of RAM. You can run 6+ LXC containers on the same hardware that runs 3 or 4 VMs.
The Setup Complexity You Need to Know
Here is where enthusiasm crashes into reality.
To give an LXC container access to your GPU, you cannot click a checkbox. You hand-edit config files and manually map devices through cgroup. You need to map /dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm, /dev/nvidia-cap1, and /dev/nvidia-cap2. These are not abstractions. These are device major and minor numbers that vary per system.
Get them wrong, and the container cannot see the GPU. You will spend three hours debugging why it nvidia-smi returns a communication error.
The NVIDIA driver inside the container must be the same version as on the host. Not close. Exactly. Driver 525 on the host and driver 530 in the container means failure. This is not a “usually works” situation. This is a hard requirement.
The security trade-off is real but often ignored. Privileged containers make GPU access easy but weaken isolation from the host kernel. A kernel exploit inside the container reaches the host. Unprivileged containers are more secure, but make GPU configuration even more complex.
When to Choose LXC
You choose LXC when you are running 5+ containers on a single host, and RAM is your actual bottleneck. When you understand Linux config files and cgroup manipulation. When you have already debugged device mapping before and understand where the pain lives.
You do not choose LXC because you read that it is faster. You choose it because you have a specific constraint and you understand the cost of solving it.
LXC is the advanced user option. Pretend it does not exist until you have real experience with the VM path.
Path Three: Docker Inside a Proxmox VM. The Boring Genius Option.
Docker inside a VM sounds redundant. Two layers of isolation. But this is where elegance lives.
You create one Ubuntu VM in Proxmox. You install Docker on that VM. You run your Ollama container inside it. That is it.
You get the Proxmox snapshot safety net (revert a bad model in seconds). You get the Docker ecosystem simplicity (99 per cent of tutorials assume this exact setup). You get GPU passthrough that works because the VM sees the GPU, and Docker asks the VM politely for access.
Open-WebUI, the ChatGPT-like interface for your LLMs, is Docker-first. Integrating it with bare LXC requires workarounds that multiply the complexity. With Docker, it is one Docker Compose command.
The Performance Reality
Docker adds 5 to 10 per cent overhead on network or I/O workloads. For GPU inference, the difference is negligible because the GPU does the heavy lifting, not the container runtime. You are not paying the full VM penalty. The VM is just a container host now, idle while the GPU works.
The total overhead (VM plus Docker) is roughly 20 to 22 per cent. On a system where the GPU is doing 95 per cent of the work, you are paying overhead on 5 per cent of the task. Effective overhead is actually less than 2 per cent.
The Ecosystem Advantage
This is the underrated factor.
Every Ollama tutorial uses Docker. Every Open-WebUI guide assumes Docker. Every AI homelab subreddit post does this exact thing. You are not fighting the ecosystem. You are flowing with it.
Setup is trivial once the VM exists:
curl -fsSL https://get.docker.com | sh
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install -y nvidia-docker2
sudo systemctl restart dockerThen pull your model:
docker run -d --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollamaThat is it. Ollama is running, GPU-accelerated, accessible at <VM_IP>:11434. No cgroup device files. No driver version matching. No config file hand-editing at midnight.
The Honest Comparison: What Each Path Actually Costs
Setup Complexity and Getting Started
The VM path wins here with no competition. You click buttons in the Proxmox UI, select your GPU, and you are done. The entire process takes 10 minutes if you move slowly.
LXC demands config file editing. You cannot avoid /etc/pve/lxc/yourcontainer.conf cgroup device rules. If you have never edited container config files before, expect to spend an hour learning the syntax and another hour debugging why it does not work.
Docker inside a VM sits in the middle. The VM setup is UI-based (easy), then Docker installation is a single bash command (also easy). You get simplicity without sacrificing ecosystem support.
GPU Configuration: The Real Complexity Cost
VM takes one PCIe passthrough setting in the Proxmox UI. That is genuinely it. Trivial.
LXC is complex beyond what most tutorials suggest. You need exact device major and minor numbers (195:*, 509:*, 234:* for NVIDIA, but these vary per system. You need to map /dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm, /dev/nvidia-cap1, and /dev/nvidia-cap2 into cgroup rules. You need the driver version inside the container to match the host exactly. This is not a "usually works" situation. This is a hard requirement.
Docker-in-VM inherits the VM’s trivial GPU access. The container asks the VM for a GPU. The VM says yes. Done.
Performance Overhead: The Math That Matters
The VM adds 15 to 25 per cent CPU penalty on pure compute tasks. But remember this: your GPU is doing 95 per cent of the inference work. The CPU overhead tax applies only to 5 per cent of the task. Effective overhead is less than 2 per cent on total inference throughput.
LXC cuts overhead dramatically. 3 to 5 per cent for compute-intensive tasks. But again, on GPU-heavy inference, this advantage disappears. You save overhead where it does not matter (CPU), while the GPU does the same work regardless.
Docker inside a VM adds roughly 5 to 10 per cent container overhead on top of the VM layer. Total overhead becomes 20 to 22 per cent. Apply the same math: you are paying overhead on 5 per cent of the task, so effective overhead is negligible on GPU inference.
RAM Consumption: Where the Numbers Bite
This is where the numbers actually matter if your host has constraints.
A VM needs 1 to 2 gigabytes just to boot Ubuntu. Each additional VM costs another 3 to 4 gigabytes. On a 32GB host, you get 3 to 4 VMs maximum before running out of RAM.
LXC containers cost only 0.5 gigabytes per container. The same 32GB host runs 6+ LXC containers comfortably. If you are running 5+ workloads on one host, LXC makes sense. For a single Ollama instance? The RAM savings are invisible.
Docker-in-VM uses the VM’s RAM baseline (4GB), then the Docker daemon adds another 200–400MB. Total for the setup is around 4.2GB as a base cost. Reasonable for most homelabs.
Kernel Isolation: Security and Stability
VM creates complete separation. The guest has its own kernel. A kernel exploit inside the VM does not reach the host. A resource exhaustion event (the model consuming 100 per cent of CPU) does not affect other VMs. This is production-grade isolation.
LXC shares the host’s kernel with all containers. A kernel exploit affects everything. A container using cgroup to hog CPU affects all containers. The isolation is process-level, not kernel-level. This is fine for trusted workloads (your own code). Risky for untrusted AI models (models downloaded from HuggingFace or elsewhere).
Docker-in-VM inherits kernel isolation from the VM, so it gets the same security benefits.
GPU Sharing Across Workloads
VM gives one VM one GPU. You cannot share a single GPU across multiple VMs.
LXC theoretically allows sharing one GPU across multiple containers if you map the device into multiple container configs. In practice, this is complex and rarely works as expected.
Docker-in-VM gives one VM one GPU. Multiple Docker containers inside the same VM can access that GPU through the Docker daemon, which handles resource sharing automatically.

The Bottom Line on Trade-Offs
VM isolates perfectly, but costs RAM and initial setup understanding (actually minimal once you understand it).
LXC saves RAM and adds overhead nowhere that matters, but adds configuration complexity that most beginners regret spending time on.
Docker-in-VM balances everything reasonably and follows ecosystem conventions that 99 per cent of tutorials expect. It is boring, but boring works.
The Recommendation That Works
Use Docker inside an Ubuntu VM.
This is not the fastest. The LXC path saves 2 to 4 gigabytes of RAM. But speed and efficiency are not why most homelabs fail. They fail because they fought the ecosystem.
Here is why Docker inside a VM wins:
Every tutorial you find assumes this setup. You do not spend mental energy fighting conventions. Every AI tool you want to add runs in Docker. Open-WebUI, SearXNG, document indexers, all of them expect Docker. You add them withdocker-compose, not by creating separate containers and debugging networking.
GPU setup is one-time, not per-container. You pass the GPU to the VM once. Every Docker container inside automatically has access. LXC requires manual setup for each container. Multiply that by 3 containers and you have spent four hours on configuration that Docker handles for free.
The RAM overhead is not invisible, but it is acceptable. You lose 4 gigabytes to the OS and Docker daemon. If your host has 32GB, you have 28GB for models and inference. The context window loss is real but not fatal for most home inference setups.
Expansion is clean. Want to add a second service alongside Ollama? docker-compose handles it. You are not learning new mental models or architecture patterns. You are just adding another container.
The Beginner Trap That Costs Hours
Many homelabbers hear LXC has 3 to 5 per cent overhead versus VMs’ 20 per cent overhead and think they have found a secret optimisation.
They spin up a privileged LXC container. They spend four hours debugging cgroup device rules. They get frustrated. They abandon the attempt and go back to VMs anyway.
Here is the actual math that matters.
If your GPU (RTX 4060, RTX 4070, whatever) is capable of 50 to 100 TFLOPS and your CPU can do 0.1 TFLOPS, the GPU is doing 500 times more work. Saving 15 per cent CPU overhead saves you 0.002 TFLOPS. Your GPU is generating 60 TFLOPS. You have saved noise.
The VM overhead is a tax on 5 per cent of the task. The effective overhead is actually less than 2 per cent on the total inference pipeline.
LXC makes sense when you are genuinely resource-constrained and running multiple containers. For a single Ollama instance on a modern homelab? VM wins on simplicity, and it is not even close.
The Advanced Path (Only If Constrained)
If your host has exactly 8GB of RAM and you cannot upgrade, you can run Docker inside an unprivileged LXC container. This combines the isolation benefits of unprivileged containers with Docker’s ecosystem.
Expect to spend 2 to 3 hours debugging. The GPU device mapping becomes more complex. Docker-in-LXC requires cgroup nesting. This is not entry-level work.
Verdict: Only if your host has 16GB RAM and you are running 3+ workloads simultaneously. Otherwise, a VM is simpler.
The Setup That Actually Works (Copy-Paste This)
Create an Ubuntu 22.04 LTS VM in Proxmox with 4 CPU cores, 8GB RAM.
Pass through your GPU via PCI in the Proxmox UI.
SSH into the VM and run:
curl -fsSL https://get.docker.com | sh && \
distribution=$(. /etc/os-release; echo $ID$VERSION_ID) && \
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - && \
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list && \
sudo apt update && sudo apt install -y nvidia-docker2 && \
sudo systemctl restart dockerPull Ollama:
docker run -d --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollamaVerify it works:
docker logs ollamaDone. Your Ollama instance is running. GPU-accelerated. Accessible at <VM_IP>:11434. No fighting config files. No driver version mismatches.
Next Steps
You have chosen your architecture. The infrastructure is ready.
On Day 4, you pull the trigger. Create the Ubuntu VM. Install Docker. Pull your first model. Run your first prompt through Open-WebUI.
You will have a fully functional local LLM inference engine that does not phone home. That does not require cloud subscriptions. That does not expose your data to third parties.
The hardware is prepped. The decision is made. The foundation is solid.
Now you build.