Run local with Ollama

Run AI on your own laptop. Free.

Pair aiDex with Ollama and turn your own machine into the LLM backend. Zero cost per call, zero data leaving your computer, fully offline-capable.

Free per-call

Open-weight models (Llama, Qwen, Mistral, DeepSeek) run locally on Ollama with zero API spend.

Stays on your machine

No third-party LLM provider sees your prompts or responses, the model runs entirely on your hardware, and nothing about the content is sent to OpenAI, Anthropic, Google, or anyone else.

No rate limits, no outages

Your model lives on your disk. No API quotas, no provider outages, no waiting in line, it answers as fast as your hardware will let it.

Your hardware, your rules

Pick the model size that fits your RAM. Mix local + cloud freely. Swap weights any time.

Three steps to local AI

From zero to chatting with a local model in under five minutes.

1
Install Ollama
Ollama is a free, open-source runner that downloads and serves open-weight models over a local HTTP API. Install it once, pull whatever models you want, and it runs in the background.
macOSLinuxWindows
macOSbrew install ollama
Linuxcurl -fsSL https://ollama.com/install.sh | sh
Windowswinget install Ollama.Ollama
Detailed download options for every OS at ollama.com/download.
2
Pull a model
Pick a model and tell Ollama to download it. The first pull takes a minute (the weights are big); after that the model lives on your disk and starts in seconds.
ollama pull llama3.2 ollama run llama3.2
3
Connect aiDex to your Ollama
aiDex connects to your Ollama instance over HTTPS. Run a small free tunnel on the same machine as Ollama to get a stable public URL, then paste it into aiDex.
Cloudflare TunnelngrokTailscale
Cloudflare Tunnelcloudflared tunnel --url http://localhost:11434
ngrokngrok http 11434
Tailscaletailscale serve --bg --https=443 http://localhost:11434
Settings path
Settings → Provider keys → Ollama URL
Cloudflare Tunnel is free for personal use and prints a stable URL. Tailscale gives you a private HTTPS endpoint only reachable from your own devices, ideal for solo use.

Recommended models by hardware

Quantized GGUF defaults (q4_K_M). RAM is the lower bound for CPU inference; VRAM is what fits comfortably on a dedicated GPU so generation stays on the card.

Ollama model	Params	RAM (CPU)	VRAM (GPU)	Best for
llama3.2:3b	3B	4 GB	—	Fast everyday chat. Runs on basically any laptop with 4 GB free RAM.
llama3.1:8b	8B	8 GB	6 GB	Solid all-rounder for reasoning, writing, and code. Sweet spot for most laptops.
qwen2.5:7b	7B	8 GB	6 GB	Strong multilingual (incl. Portuguese & Spanish). Good for cross-locale teams.
mistral:7b	7B	8 GB	6 GB	Sharp for code review, refactors, and explaining unfamiliar APIs.
deepseek-r1:7b	7B	8 GB	6 GB	Reasoning-tuned distillation. Slower per token, deeper answers.
llama3.3:70b-q4	70B	48 GB	40 GB	Frontier-tier locally, only run if you have a beefy GPU or workstation.

Bigger models read more nuance but generate slower. Start with an 8B; step up to 70B only if you have the hardware to keep generation snappy.

Tips for getting the most out of local

Match the model to your RAM
A model that won't fit in RAM falls back to disk swap and grinds. Look at the recommended RAM column and pick one tier below your laptop's free RAM, leave headroom for the OS and your browser.
Mix local and cloud in one team
Run the chatty agents locally (free, fast) and reserve cloud calls for the frontier judge or synthesis step. Best of both: low cost, high ceiling.
Run the moderator locally too
aiDex's moderator just needs to emit small JSON plans. Llama 3.1 8B or Qwen 2.5 7B handle that easily, set it as your team's moderator and your whole conversation runs on-device.
GPU acceleration is automatic
Ollama uses Metal on Apple Silicon, CUDA on NVIDIA, and Vulkan on AMD with zero config. If you have a GPU, you'll see it light up the moment generation starts.

No API bill. No data sharing. No lock-in.

Sign up for aiDex and you get the team-chat orchestration. Run the models on your own hardware and you get them free, private, and offline.

Run AI on your own laptop. Free.

Three steps to local AI

Install Ollama

Pull a model

Connect aiDex to your Ollama

Recommended models by hardware

Tips for getting the most out of local

No API bill. No data sharing. No lock-in.