You stand at the intersection of two digital desires that once seemed incompatible. You want the raw power of large language models. You also want to keep your documents, code, and proprietary knowledge locked away from cloud servers and data brokers. This tension has driven countless developers to ask the same question: Is there a way to have both?
The answer exists, and it exists locally.
PrivateGPT is not simply another chatbot interface. It is an architecture for taking hundreds, thousands, even tens of thousands of documents and transforming them into a searchable, queryable knowledge system that runs entirely within your control.
Unlike Open WebUI, which lets you chat with a single file at a time, PrivateGPT ingests entire document collections, embeds them into vector space, and exposes them through a programmatic API. It is the heavy lifter of the local RAGecosystem.
This guide walks you through installing PrivateGPT and connecting it to your existing Ollama setup. By the end, you will have a production-ready system capable of indexing your entire digital life and answering questions about it — without a single byte leaving your machine.
Understanding RAG: Why This Matters
Before diving into terminal commands, comprehend what you are actually building. RAG stands for Retrieval Augmented Generation. It answers a fundamental problem with large language models: they hallucinate. They confabulate. They make up convincing answers to questions they do not actually understand.
RAG solves this by changing the workflow. Instead of asking the LLM directly, you first ask it to retrieve relevant information from your knowledge base. Imagine studying for an exam. You do not walk into the exam hall relying purely on memory. You read the textbook first, find the relevant chapter, and then answer the question grounded in what you just read. That is RAG. That is PrivateGPT.
The technical pipeline works like this. You upload your documents. PrivateGPT parses them, chunks them into semantically meaningful pieces, and converts each chunk into a numerical representation called an embedding. These embeddings get stored in a vector database. When you ask a question, PrivateGPT converts your question to an embedding, finds the most similar document chunks, and feeds those chunks to the LLM alongside your query.
The LLM generates a response grounded in what it retrieved. Crucially, you get citations pointing to exactly where the answer came from.
Now here is the privacy angle that makes this relevant. All of this — parsing, embedding, retrieval, generation — happens on your machine. Your network. Your GPU or CPU. Nothing touches a cloud API. No vendor knows what documents you are asking about. No third party is training models on your proprietary code or confidential legal documents. This is the architecture that enterprises, legal firms, and healthcare organizations need.
PrivateGPT implements this architecture with production-grade tooling. It exposes REST APIs compatible with OpenAI’s standard. It supports streaming responses. It handles concurrent requests. It integrates seamlessly with Ollama for the underlying language models. It uses Qdrant as the vector database. It leverages LlamaIndex for the RAG orchestration. These are battle-tested components, not hobby projects.
Prerequisites: The Non-Negotiable Requirements
PrivateGPT is particular about its dependencies. Skip or substitute any of these, and your installation will fail cryptically. Do not proceed without reading this section.
Python 3.11 Is Mandatory
PrivateGPT requires Python 3.11. Not 3.10. Not 3.12. Not 3.13. Exactly 3.11.
This is not arbitrary. The underlying dependencies — particularly certain versions of LlamaIndex and LangChain — have pinned requirements on the Python version. Attempting to run PrivateGPT on Python 3.12 will result in resolution conflicts that Poetry cannot overcome. Attempting 3.10 will miss critical features those libraries depend on.
Verify your current Python version before proceeding.
python --versionIf you have Python 3.11 installed but it is not your default, you can use a version manager like pyenv to make it the default for your project directory.
Installing Python 3.11
On macOS (using Homebrew)
brew install python@3.11On Linux (Ubuntu/Debian)
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-devOn Windows
Download the installer directly from python.org. During installation, check the box to add Python to PATH. Verify with:
python --versionPoetry: The Dependency Manager
Poetry is the tool that manages your project dependencies with precision. It replaces the chaos of pip and requirements.txt with a single pyproject.toml file that locks all dependency versions.
Install Poetry using pipx (recommended)
pipx install poetryIf you do not have pipx, install it first. Pipx keeps Poetry in an isolated environment, preventing version conflicts.
Verify your Poetry installation
poetry --versionThe output should show Poetry version 1.8.3 or higher. If you have an older version, update it immediately.
poetry self update 1.8.3Make: The Build Tool
PrivateGPT uses Makefiles to automate common tasks. You need the make utility installed on your system.
On macOS
brew install makeOn Linux
sudo apt install build-essentialOn Windows (using Chocolatey)
choco install makeOllama: Running Locally
PrivateGPT does not include its own LLM. It relies on Ollama to provide one. Ollama must be installed and running as a service on your machine before you start PrivateGPT.
Go to ollama.ai and download the installer for your operating system. Install it and verify that it starts automatically on boot. Test that Ollama is running by opening your terminal and checking:
curl http://localhost:11434You should receive a response. If not, Ollama is not running.
Git: Version Control
Clone the PrivateGPT repository using Git.
git --versionIf Git is not installed, get it from git-scm.com.
Step 1: Clone the Repository
Navigate to a directory where you want to keep your PrivateGPT installation. This should be a location where you have write permissions and that is not within your system Python directories.
git clone https://github.com/zylon-ai/private-gpt
cd private-gptThis creates a local copy of the PrivateGPT repository in a folder called private-gpt. All subsequent commands assume you are inside this directory.
Step 2: Set Python 3.11 as Your Local Version
PrivateGPT needs to know that it should use Python 3.11 specifically. If you have multiple Python versions installed, tell Poetry explicitly.
Using pyenv (if installed)
pyenv local 3.11Alternatively, create a .python-version file in the directory
echo "3.11" > .python-versionVerify that Poetry recognizes Python 3.11
poetry env infoLook for Python version in the output. It should say 3.11.
Step 3: Pull the Required Models in Ollama
PrivateGPT needs two models running in Ollama. The first is the language model that generates responses. The second is the embedding model that converts text to vectors.
In a terminal (separate from where you will run PrivateGPT), start the Ollama service if it is not already running
ollama serveThen in another terminal, pull the models, you can use models accordingly to your use case, and better paramters models based on your system specifications
ollama pull llama3.1
ollama pull nomic-embed-textThe first command downloads llama3.1 (approximately 4GB). The second downloads nomic-embed-text (approximately 275MB). These will take a few minutes depending on your internet speed.
Nomic Embed is specifically chosen because it excels at embedding long documents. It handles up to 8192 tokens in a single embedding, matching the context length of OpenAI’s Ada model but with better performance on semantic search tasks. This matters when you are indexing dense legal documents, technical specifications, or academic papers.
Step 4: Install PrivateGPT with Poetry
Back in your terminal in the private-gpt directory, install PrivateGPT with the Ollama extras.
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"This single command does the heavy lifting. Poetry reads pyproject.toml, resolves all dependencies to compatible versions, creates a virtual environment specifically for this project, and installs everything in that isolated space.
Breaking down what each extra provides:
- ui: Installs the Gradio-based web interface so you can interact with PrivateGPT through localhost:8001
- llms-ollama: Enables PrivateGPT to communicate with Ollama for language model inference
- embeddings-ollama: Enables PrivateGPT to ask Ollama to generate embeddings for documents and queries
- vector-stores-qdrant: Installs the Qdrant vector database support; Qdrant runs embedded and does not require separate installation
The installation may take several minutes. Poetry is downloading Python packages, verifying their cryptographic signatures, and building wheels (pre-compiled binaries) for packages that need compilation. On the first run, this is slow. Subsequent runs are cached and instant.
Step 5: Verify That Ollama Is Running
PrivateGPT will not start if Ollama is not running. Make absolutely certain that Ollama is actively serving on localhost:11434.
In a separate terminal
curl http://localhost:11434You should see an HTML response beginning with “Ollama is running”. If you see a connection refused error, Ollama is not running. Start it in another terminal with ollama serve.
Step 6: Start PrivateGPT
Now comes the moment when you bring it all together. From the private-gpt directory, run PrivateGPT with the Ollama profile.
On macOS and Linux
PGPT_PROFILES=ollama make runOn Windows PowerShell
$env:PGPT_PROFILES="ollama"; make runOn Windows Command Prompt (CMD)
set PGPT_PROFILES=ollama
make runThe PGPT_PROFILES environment variable tells PrivateGPT which configuration file to load. The ollama profile specifically configures PrivateGPT to use Ollama for the LLM, Ollama for embeddings, and Qdrant for vector storage.
The first startup takes longer because PrivateGPT initializes Qdrant and verifies all connections. You will see output similar to:
INFO: Uvicorn running on http://127.0.0.1:8000
INFO: Application startup completeThe API is now running on localhost:8000. The web UI is running on localhost:8001.
Step 7: Access the Web Interface
Open your web browser and navigate to http://localhost:8001.
You will see the Gradio interface. It has two main tabs: Chat and Query Files.
The Chat tab lets you talk directly to the LLM without any documents providing context. This is useful for general questions or testing that the system is responding.
The Query Files tab is where the magic happens.
Step 8: Upload Your First Document
In the Query Files tab, click the Upload Files button. Select a PDF, text file, Word document, or any other supported format. Start with something you know well — a device manual, a recipe, a technical article, anything with specific information you can later ask about.
Once selected, the file begins ingesting. You will see an “Ingesting” progress bar. This is PrivateGPT doing the work. It is parsing the document, splitting it into chunks, generating embeddings for each chunk, and storing those embeddings in Qdrant. The time this takes depends on the file size and your hardware. A 50-page PDF typically takes 30 seconds to 2 minutes.
Step 9: Query Your Document
Once ingestion completes, ask a specific question about the document. If you uploaded a device manual, ask “What is the warranty period?” If you uploaded a recipe, ask “What temperature should the oven be set to?”
PrivateGPT retrieves the relevant chunks and passes them to the LLM. You get back an answer grounded in your document.
Critically, you also get citations. PrivateGPT shows you exactly which document chunk the answer came from. This is not a hallucination. This is retrieved fact.
Understanding the Architecture You Built
What you have now is more than just a chatbot. You have built three layers working in concert.
The ingestion layer sits at the bottom. It parses documents, generates embeddings, and stores them in the vector database. This layer is triggered when you upload files.
The retrieval layer sits in the middle. When you submit a query, this layer converts your question to an embedding and searches Qdrant for similar chunks. The vector database returns the top matching chunks, ranked by semantic similarity.
The generation layer sits on top. It takes your query, the retrieved chunks, and sends them to Ollama’s llama3.1 model. The LLM reads everything in context and generates a response. All of this happens on your machine.
The API supporting all three layers is exposed at localhost:8000. This means you can programmatically ingest documents, retrieve context, and generate responses without ever touching the web interface. This is where PrivateGPT becomes a tool for building production systems, not just a chatbot.
What Comes Next
You have the foundation. From here, you can expand in multiple directions.
You can upload more documents and build a knowledge base specific to your needs. Legal teams can ingest entire contract templates and case law. Developers can upload API documentation and internal coding standards. Researchers can upload academic papers and lab notes.
You can configure different embedding models if nomic-embed-text does not fit your use case. You can swap the LLM from llama3.1 to a different model that Ollama supports.
You can access the API programmatically, building custom applications on top of PrivateGPT instead of relying on the web interface.
You can deploy this entire system to a private data center or VPC if you need enterprise-grade isolation.
The core achievement is this: You have taken control of your data. You have built a system that respects your privacy by design. You have created a research engine that understands your documents as deeply as any cloud-hosted service would — but every computation stays within your control.
This is the foundation. Build on it.
Conclusion: Your Private Knowledge Fortress Is Complete
The infrastructure is ready. You have installed Python 3.11, Poetry, and Make. You have installed PrivateGPT and connected it to Ollama. You have uploaded your first document and queried it successfully. You have seen citations that prove the system is not hallucinating but retrieving.
You now own a complete, private, locally-running Retrieval Augmented Generation pipeline. No cloud APIs. No data brokers. No black boxes.
This completes your journey through the home lab series. You have the hardware from Day 1. You have the operating system from Day 4. You have the language models from Day 8. You have the open interface from Day 9. And now you have the research engine from Day 10.
In the next article, we focus on the most critical step that ties everything together: protecting all of it. We will implement automated backups, manage recovery scenarios, and ensure that if disaster strikes, your entire system can be restored from scratch.
Continue to Day 11: Automating Backups for Your AI Home Lab.