The Definitive Guide to Proxmox GPU Passthrough — Local AI & Gaming

The Definitive Guide to Proxmox GPU Passthrough — Local AI & Gaming

So, you’ve assembled your server hardware, installed Proxmox VE, and you’re staring at that clean web interface, ready to build a powerhouse for local AI development. There’s just one problem: by default, your expensive, powerful GPU is sitting idle, completely invisible to your future virtual machines


Proxmox, as the host operating system, takes ownership of all hardware, and it doesn’t share nicely. This is where Proxmox GPU Passthrough comes in — it is the secret sauce, the critical link that bridges your physical hardware with your virtualised AI environment.

The goal is to perform a technique called VFIO (Virtual Function I/O). In simple terms, we will instruct the Proxmox host, “Hands off this GPU.” We will isolate the graphics card from the host’s control so that a specific Virtual Machine (VM) can seize it and use it with near-total ownership.

Why This Is Worth the Effort?

Getting this right is a game-changer. When GPU passthrough works, you grant your VM 95–99% of the GPU’s native performance. AI models train faster, inference is instantaneous, and GPU-accelerated applications run seamlessly. But be warned: this process is notoriously finicky. A single missed step can lead to system instability, VM crashes, the dreaded black screen on boot, or the infamous NVIDIA Error 43.

  1. Phase 1: BIOS & Hardware Preparation
  2. Phase 2: Configuring the Proxmox Host
  3. Phase 3: Building and Tweaking the Virtual Machine

Let’s start.

II. Phase 1: BIOS and Hardware Prep (Don’t Skip This)

Before we touch a single line of code in the Proxmox terminal, we must prepare the very foundation of our server: the BIOS/UEFI. A 50% of passthrough failures can be traced back to incorrect settings at this level. Users often get excited and jump straight to the software, only to be met with cryptic errors. Do not be one of them.

Reboot your server and enter your BIOS/UEFI setup (usually by pressing DEL, F2, or F12 on boot). The exact naming and location of these settings vary between motherboard manufacturers, but the core technologies are the same.

Mandatory BIOS/UEFI Checklist:

1. Enable Virtualisation Technology (VT-d or AMD-Vi)

This is the cornerstone of everything we are about to do. This technology allows the motherboard’s chipset to manage and direct I/O (Input/Output) requests from virtual machines to physical hardware devices.

  • For Intel Users: Find and enable Intel VT-d (Intel Virtualisation Technology for Directed I/O). This is often located under “Chipset,” “System Agent,” or “North Bridge” settings. Do not confuse it with VT-x, which is for CPU virtualisation and should already be enabled.
  • For AMD Users: Find and enable AMD-Vi. This feature may also be labelled as IOMMU or, on older boards, SVM Mode. It’s typically found in the “Advanced” or “CPU Configuration” sections.

2. Enable IOMMU

While enabling VT-d or AMD-Vi often handles this, some motherboards have a separate, explicit IOMMU setting. If you see it, ensure it is Enabled. The IOMMU (Input-Output Memory Management Unit) is the hardware component that translates device addresses, making it possible to create isolated “lanes” for devices — a prerequisite for passthrough.

3. Enable “Above 4G Decoding”

Modern GPUs require a large address space in system memory. This setting allows 64-bit PCIe devices to use address spaces above the 4GB mark. Without it, the GPU may fail to initialise correctly within the VM. You’ll typically find this in the “PCI Subsystem Settings” or “Chipset Configuration.” While you are there, if you see an option for Re-Size BAR Support, you can enable that as well, though it’s less critical for the passthrough itself.

Once you have enabled these three settings, save your changes and reboot the server. Let it boot into Proxmox. The hardware is now ready.

III. Phase 2: Configuring the Proxmox Host (The Terminal Work)

With the BIOS correctly configured, it’s time to instruct the Proxmox host OS to prepare for passthrough. This phase involves editing system configuration files via the terminal. You can do this through the Proxmox web UI by selecting your node and opening theShell, or by connecting directly via SSH.

Step 1: Edit the GRUB Bootloader 

GRUB is the bootloader that starts Proxmox. We need to pass kernel parameters to it on boot to activate the IOMMU.

First, open the GRUB configuration file with a text editor:

nano /etc/default/gru

Find the line that looks like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

We will add parameters here. The exact parameter depends on your CPU.

  • For Intel CPUs: Add intel_iommu=on iommu=pt. The line should look like this:GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
  • For AMD CPUs: Add amd_iommu=on iommu=pt. The line should look like this:GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

The iommu=on part explicitly enables the IOMMU, while iommu=pt (passthrough mode) tells the kernel to only engage the IOMMU for devices that will be passed through, rather than for all devices, which improves performance.

After editing, save the file (Ctrl+X, then Y, then Enter).

Now, you must apply these changes. Run the following command

update-grub

Then, reboot the host for the changes to take effect:

reboot

Step 2: Load Required Kernel Modules

Next, we need to ensure Proxmox loads the necessary VFIO modules on every boot. These modules are what facilitate the device isolation.

Edit the /etc/modules file:

nano /etc/modules

Add the following four lines to the end of the file:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Save and close the file. These modules will now load automatically on the next boot.

Step 3: Finding Your GPU’s Vendor IDs

Now we need to identify the specific hardware we want to isolate. Every PCIe device has a unique set of IDs. We’ll use the lspci command to find them. Run the following command, replacing nvidia with amd If you have an AMD card:

lspci -nnk | grep -i nvidia

You will see an output with two or more lines for your GPU. Most modern GPUs are multi-function devices, containing the graphics processor and an audio controller (for HDMI/DisplayPort audio). We need to pass through all the functions of the card.

Look for output similar to this:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 9050 Ti] [10de:1c82] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)

Yes, you read it right 9050ti coz I am from the future…

The critical information here is the pair of four-character vendor and device IDs in brackets: [10de:1c82] and [10de:0fb9]. Make a note of these. We will need them shortly.

Step 4: Blacklisting Drivers to Enable VFIO

By default, Proxmox will try to load standard graphics drivers (nouveau for NVIDIA, amdgpu for AMD) for your GPU, giving the host control. We must prevent this so the vfio-pci driver can bind to the card instead.

Create a new configuration file for this purpose:

nano /etc/modprobe.d/blacklist.conf

Add the following lines to this new file. If you have an AMD card, replace it nvidia with amdgpu.

blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb

Save and close the file.

Next, we will tell Proxmox which device the vfio-pci driver should claim. Create another new file:

nano /etc/modprobe.d/vfio.conf

Add the following line, replacing the IDs with the ones you found in Step 3. The IDs must be comma-separated.

options vfio-pci ids=10de:1c82,10de:0fb9

Save and close the file. This command tells VFIO, “These are the devices I want you to reserve for passthrough.”

Finally, update the kernel’s initial RAM disk and reboot one last time to apply all changes:

update-initramfs -u
reboot

After the host reboots, you can verify that the VFIO driver has successfully claimed the GPU by running lspci -nnk. The output for your GPU should now show Kernel driver in use: vfio-pci. If it does, your host is perfectly configured.

IV. Phase 3: The VM Configuration (The “Error 43” Fix)

With the host prepared, we can finally create the VM and grant it the GPU.

Step 1: Creating the Virtual Machine

In the Proxmox web UI, create a new VM with these critical settings:

OS: Select the appropriate OS type (e.g., Linux).

System:

  • Machine: q35. This is non-negotiable. The default (i440fx) is a legacy chipset and does not properly support modern PCIe passthrough.
  • BIOS: OVMF (UEFI). This is highly recommended for GPU passthrough as it is more compatible with modern hardware initialisation. You will need to create an EFI disk for the VM as well.

CPU/Memory/Network: Configure as needed for your workload.

Step 2: Adding the PCI Device

Once the VM is created, select it, go to the Hardware tab, and click Add > PCI Device.

  • Device: Select your GPU from the dropdown list.

Critical Settings (check all that apply):

  • All Functions: Check this box. This ensures all parts of the card (like the audio controller) are passed through together.
  • ROM-Bar: Check this. This helps the GPU’s firmware (ROM) initialize correctly inside the VM.
  • Primary GPU: Do NOT check this unless it is the only graphics card in your system and you want it to handle the VM’s console output directly. For a server with onboard graphics or a secondary GPU for the host, leave this unchecked.

Click Add. The PCI device will now appear in your VM’s hardware list.

Step 3: The “Secret” Settings for the NVIDIA Error 43 Fix

For years, NVIDIA’s consumer-grade GeForce drivers would detect when they were running inside a virtual machine and shut down, reporting a dreaded “Error 43” in the Windows Device Manager or causing similar issues in Linux. This was a business decision to segment their consumer (GeForce) and professional (Quadro) product lines.

While modern drivers (post-465) have largely relaxed this, it can still surface depending on the card and driver version. The following settings are the definitive safety net to prevent it.

We need to manually edit the VM’s configuration file. Using the Proxmox node shell, open the file (replace 100 with your VM's ID):

nano /etc/pve/qemu-server/100.conf

Method 1: The Modern Fix

For many modern systems, simply hiding the KVM hypervisor from the guest is enough. Find the line that specifies your CPU (e.g., cpu: x86-64-v2-AES) and modify it to:

cpu: host,hidden=1

This passes the host CPU’s features directly to the VM while setting a flag that tells the guest it’s not running in a virtualized environment.

Method 2: The Classic “Nuclear Option”

If the above doesn’t work, or for maximum compatibility, add the following line to the bottom of the configuration file. This line provides a more extensive set of arguments to QEMU (the underlying hypervisor).

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

Let’s break this down:

  • hv_vendor_id=NV43FIX: This changes the hypervisor's reported vendor ID, scrambling it to something the NVIDIA driver doesn't recognize.
  • kvm=off: This directly tells the guest OS to hide the KVM virtualization interface.

Save the file and close it. You should now be able to start your VM and install the GPU drivers as you would on a bare-metal machine.

V. Troubleshooting Common Issues

Problem: “IOMMU Group is not valid”

  • Explanation: The IOMMU groups devices together based on your motherboard’s topology. If your GPU is in the same group as other essential devices (like a USB controller or the motherboard chipset itself), Proxmox will refuse to pass it through to prevent system instability.
  • Solution (Last Resort): There is a kernel patch that tells Proxmox to be less strict about these groupings. Edit /etc/default/grub again and add pcie_acs_override=downstream,multifunction to the GRUB_CMDLINE_LINUX_DEFAULT line. Run update-grub and reboot. This is a powerful but potentially risky fix, as it can break the isolation between devices.

Problem: “VM starts, but the screen is black.”

  • Explanation: This often means the Proxmox host has reclaimed the GPU before the VM could.
  • Solution: Shut down the VM and, on the Proxmox host shell, run lspci -k | grep -A 2 -E "VGA|3D". Verify that the Kernel driver in use for your GPU is vfio-pci. If it shows nouveau or nvidiaYour blacklisting has failed. Double-check your /etc/modprobe.d/ files and make sure you have run update-initramfs -u and rebooted.

VI. Conclusion & Your Next Steps

Nice! You have navigated one of the most challenging but powerful configurations in the Proxmox ecosystem. 

By carefully preparing your hardware, instructing the host OS, and configuring your VM, you have successfully isolated your GPU and handed it over to a virtual machine. You have bridged the physical and virtual worlds, unlocking the full potential of your hardware for demanding tasks like AI and machine learning.

Your hardware and hypervisor are now ready. The next step is to build the software environment on top of this foundation.

Stay tuned for the next article.

Post a Comment

Previous Post Next Post