Bring Ollama Into Your aiDex Chat: Local Models, Same Table

Run open-weight models on your own machine and mix them with cloud models in one conversation.

By The aiDex Team, Multi-model AI platformPublished Jun 9, 2026Updated Jun 9, 20265 min read

TL;DR

aiDex can seat local models running through Ollama at the same table as cloud models like Claude Opus 4.8, GPT-5.4, and Gemini 3.1 Pro. Run Ollama on your machine, connect it in Settings, and your local models join Solo, Compare, Judge, Pipeline, or Team conversations. Keep sensitive work fully local, or mix free local seats with paid cloud seats in one chat.

Some questions should never leave your laptop. Client contracts, unreleased code, internal financials: the moment you paste them into a cloud chatbot, you are trusting someone else's servers. Local models solve that, but they usually live in a lonely terminal window, cut off from the stronger cloud models you still need for the hard problems.

aiDex removes that wall. Models served by Ollama run on your own machine and join the same panel as the cloud heavyweights, so you decide per conversation what stays local and what goes out.

What is Ollama, and what does it do inside aiDex?

Ollama is free, open-source software that downloads and runs open-weight language models (Llama, Mistral, Gemma, Qwen, and DeepSeek family models, among others) directly on your computer. No account, no API bill: if your hardware can hold the model, it runs.

Inside aiDex, an Ollama model behaves like any other seat at the table. You can give it a Solo chat, line it up against Claude Opus 4.8 in Compare, let it vote in Judge, slot it into a Pipeline stage, or drop it into a Team conversation with up to five models. The moderator treats it exactly like a cloud participant, and it reads the same attached documents (DOCX, PDF, MD, txt) as everyone else in the chat.

How do I connect Ollama to aiDex?

Three steps, roughly ten minutes the first time:

Install Ollama from the official site at ollama.com (macOS, Windows, Linux).
Pull a model. Run ollama pull llama3 (or any model from the Ollama library) in your terminal. Ollama serves it locally, by default on port 11434.
Point aiDex at it. Open Settings, add your local Ollama endpoint, and your installed models show up in the Dex next to the cloud catalog.

From there, picking a local model is the same gesture as picking any other model: open the list, select the seat.

When do local models beat cloud models?

Treat this as decision criteria, not dogma:

Privacy and confidentiality. Prompts sent to an Ollama model are processed on your machine. For regulated or contractual content, that single fact can decide the whole question.
Cost at volume. A local model has no per-token charge. For high-volume, repetitive work (classification, extraction, first-pass summaries), free beats cheap.
Offline and latency. On a plane, behind a restrictive firewall, or on a flaky connection, a local seat keeps working.
Cloud still wins on frontier reasoning. Long multi-step reasoning, very large documents, and the hardest code remain the territory of models like Claude Opus 4.8, GPT-5.4, and Gemini 3.1 Pro, which are bigger than anything a laptop can host.

The honest answer for most people is "both", and that is exactly why having them in one chat matters. Local seats slot into every pattern in our multi-model AI workflows guide.

How do I mix local and cloud models in one conversation?

Three patterns that work well:

Compare with a control. Send the same prompt to a local model and to GPT-5.4 or Claude Opus 4.8 in Compare mode. Over a week you learn exactly which of your tasks the free local seat handles well enough, the same way we suggest comparing AI models side by side for cloud models.
Local drafts, cloud polish. In Pipeline mode, let a local model produce the rough Draft stage, then hand Critique and Polish to Claude Opus 4.8. You spend cloud tokens only on the stages that need them.
A local voice on the panel. In a Teams chat, a local model adds a different perspective at zero marginal cost, alongside the patterns in creating a multi-AI team.

One honest caveat: in a mixed panel, every model in the conversation reads the chat, including attached documents. If content must not leave your machine, keep the entire panel local. aiDex runs fully local when every seat is an Ollama model.

Ready to test it? Open aiDex, connect your Ollama endpoint, and run your first local-vs-cloud Compare.

What does this cost?

The local side costs nothing beyond your hardware and electricity: Ollama is free and open-weight models have no per-token fee. The cloud side works however you prefer. Use your own provider keys or the ones we manage, and pick the models you want. Per-message costs stay visible in the chat and spending limits cap the month, so a panel that mixes one free local seat with two paid cloud seats never surprises you. For the cloud numbers, see our cost-per-token breakdown.

Where should I start?

Install Ollama, pull one small model (an 8B model is plenty to learn with), connect it in Settings, and put it in a Compare next to a cloud model you already trust. Ten minutes of setup buys you a permanent free seat at the table. When you are ready, open aiDex and give your laptop a chair.

The aiDex Team · Multi-model AI platform

aiDex is a multi-model AI platform that lets you query several AI models at once, compare their answers, run consensus picks, and chain models in pipelines or open team chats. Use your own provider keys or the ones we manage, and pick the models you want.

Frequently asked questions

Is Ollama free to use?

Yes. Ollama is free, open-source software, and the open-weight models it runs have no per-token cost. You pay only with your own hardware and electricity. Cloud models sitting in the same aiDex chat are billed normally through your keys or managed credits.

Do I need a GPU to run Ollama?

No, but it helps. Small models in the 3B-8B parameter range run on a modern laptop CPU. A dedicated GPU or an Apple Silicon Mac makes responses much faster and lets you run larger, more capable models.

Which models can I run through Ollama?

Open-weight model families such as Llama, Mistral, Gemma, Qwen, and DeepSeek distillations, among hundreds of options in the Ollama library. Once Ollama serves them, they appear in the Dex like any other model.

Does my data stay private with a local model?

Calls to an Ollama model are processed on your machine and are not sent to a cloud provider. In a mixed panel, cloud models in the same conversation do receive the chat content, so keep strictly confidential chats local-only.

Can I use aiDex with only local models?

Yes. Every seat in a conversation can be an Ollama model, so the whole chat runs fully local. You can still switch any seat to a cloud model later if a question outgrows your hardware.

Start hereMulti-Model AI Workflows: Why Query All Models at Once (2026 Guide)

Keep reading

Workflows

Multi-Model AI Workflows: Why Query All Models at Once (2026 Guide)

One model is one opinion. Here is how to query several at once and get a better answer.

Updated Jun 7, 20268 min read

Workflows

How to Compare AI Models Side by Side

Send one prompt to several models at once, read the answers side by side, and let the output decide instead of the hype.

Updated Jun 5, 20266 min read

BenchmarksDATA

AI Model Pricing in 2026: Real Cost-Per-Token for Power Users

What every major AI model charges per million tokens, and what that means for one real query.

Updated Jun 7, 20266 min read

Workflows

How to Create a Multi-AI Team in aiDex

Build a panel of named AI personas, each pinned to its own model, with a moderator that watches for consensus.

Updated Jun 5, 20268 min read