Ollama + Open WebUI: Build Your Own Private AI Assistant in 10 Minutes

What Is the Local AI Stack?

Two projects, one purpose: a private, offline AI assistant that never charges per query.

Ollama — runs LLMs locally. One command to pull a model, one to run it.
Open WebUI — a ChatGPT-like interface for Ollama. Chat, upload files, search the web, switch models.

Together they form the most accessible local AI stack in 2026. No cloud dependency, no subscription, no data leaving your machine.

What You’ll Get

A fully private ChatGPT alternative running on your own hardware
A 10-minute install with every command verified
A ready-to-use Docker Compose configuration
Room to grow: web search, multi-user, Home Assistant integration

Why Self-Host an LLM?

Data Privacy

Every message sent to ChatGPT or Claude lives on someone else’s server. Terms of service allow training on your data. For sensitive code, business documents, or personal information, local inference eliminates that exposure entirely.

Cost

Service	Monthly	Limits
ChatGPT Plus	$20	50 msgs/3h on GPT-4
Claude Pro	$20	Limited
Ollama + Open WebUI	$0	Unlimited

Your GPU is already paid for.

Model Flexibility

Switch between Llama 3, Qwen 2.5, Mistral, Gemma — any model Ollama supports — with a dropdown. No waitlists, no forced upgrades.

Enough reasons. Let’s install.

Prerequisites

Any computer (Linux, macOS, Windows)
Docker (download here)
8GB RAM minimum (16GB recommended)
GPU optional — accelerates inference; CPU works for smaller models

Installation

Step 1: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Verify:

ollama --version
# ollama version 0.5.x

Windows: download from ollama.com/download.

Keep your terminal open — the next step uses it.

Step 2: Pull a Model

Start with a solid 7B model:

ollama pull qwen2.5:7b

~4GB download. If downloads are slow (e.g., in China), either:

Via proxy: set export HTTPS_PROXY=http://127.0.0.1:7890 in your terminal before pulling
Manual import: download GGUF files from hf-mirror.com and import via ollama create — see Ollama import docs

Test it:

ollama run qwen2.5:7b "Explain Docker in one sentence"

If you see output, you’re running LLMs locally.

Step 3: Install Open WebUI

Make sure Docker is running, then pick one method.

Method A: One-liner (fast)

docker run -d \
  --name open-webui \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

What this does:

-p 3000:8080 — maps container port 8080 to your machine’s port 3000
-v open-webui:/app/backend/data — persists chat history and config
--restart always — auto-starts on boot

Method B: docker-compose.yml (recommended — easier to add services later)

Create a docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    volumes:
      - open-webui:/app/backend/data
    restart: always

volumes:
  open-webui:

Then run:

docker compose up -d

Open http://localhost:3000. Create a local account (data stays on your machine), and you’ll see the chat interface.

Step 4: Connect to Ollama

Open WebUI auto-detects Ollama on the same host. When connected, you’ll see a green status indicator at the top and your model (qwen2.5:7b) in the model dropdown.

If it doesn’t connect:

Avatar → Settings → External Connections
Ollama Base URL: http://host.docker.internal:11434
Save — the status refreshes immediately.

Done. You now have a private ChatGPT.

What Can You Do With It?

Private Code Review

Drag a source file into Open WebUI for AI review. Your code never leaves your network — suitable for proprietary projects where GitHub Copilot or ChatGPT aren’t options.

Document Q&A

Upload PDFs, Word docs, or text files. Ask questions about their content:

“Summarize the key findings in this report.”
“Extract all technical requirements from pages 12-18.”

Daily Tasks

Draft emails, generate reports from bullet points, explain technical concepts. A 7B model handles these comfortably.

Open WebUI also supports multi-model mode — ask the same question to Llama and Qwen side by side, pick the best answer.

Model Selection

Use Case	Model	RAM	Notes
Entry	`qwen2.5:7b`	4GB	Strong Chinese, best value
Code	`codellama:7b`	4GB	Code-specific training
Balanced	`mistral:7b`	4GB	Fast, strong in English
Advanced	`llama3.1:8b`	8GB	Meta’s latest, best all-round
High-end	`qwen2.5:32b`	24GB	Approaches GPT-4
Production	`llama3.1:70b`	48GB	Needs A100-class GPU

Start with a 7B model. Upgrade when you know what you need.

FAQ

Can’t connect to Ollama — no green indicator

Most likely the Docker container can’t reach your host. Make sure Ollama is running, then check the Ollama URL in Open WebUI settings:

Windows/macOS: http://host.docker.internal:11434
Linux: http://172.17.0.1:11434

Model download keeps failing

Ollama supports resumable downloads. Just re-run ollama pull qwen2.5:7b — it continues where it left off. If it keeps failing, switch to manual GGUF import.

Open WebUI shows a blank page after starting

Clear your browser cache or open an incognito window. Check Docker logs:

docker logs open-webui

If you see Uvicorn running on http://0.0.0.0:8080, the service is healthy — the issue is browser-side.

Model responses are poor quality

For Chinese content, qwen2.5:7b is the best 7B option. If quality is still lacking, try qwen2.5:14b (needs 8GB RAM) or verify you’ve selected the correct model in the Open WebUI dropdown.

Going Further

Web Search

Open WebUI supports SearXNG integration for live internet access. Add this to your docker-compose.yml and run docker compose up -d:

services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - "4000:8080"

Configure the URL in Open WebUI settings under Web Search.

Custom Models

Adjust system prompts, temperature, and context length from the Open WebUI interface.

Limitations

7B models aren’t GPT-4. Fine for drafting and explaining; don’t expect architectural insights.
CPU is slow — roughly 5-10 tokens/second for a 7B model.
Large models need VRAM — 32B+ requires 24GB+ GPU memory.

A free, private, unlimited AI assistant is a meaningful tradeoff. Judge for yourself.

Next Steps

Install it. Try a few models. See what works on your hardware.

Coming up next: hooking Ollama into Home Assistant for an AI-powered smart home.

Further reading: New to Docker? Start with Docker’s official get-started guide. For the OpenCode AI coding assistant tutorial, see OpenCode Complete Guide.