"Should we RAG, fine-tune, or build an agent?" is the wrong question. The right question is "what is the actual job to be done — and which technique buys us the right trade-offs in cost, latency, accuracy, governance, and team capability?"
The three approaches are not mutually exclusive. Most production systems we ship use all three. But each has a clear sweet spot. This essay maps that.
What each approach actually does
RAG (Retrieval Augmented Generation)
The model is given relevant documents at inference time and instructed to answer based on them. The model itself is unchanged. You're swapping context, not weights.
Fine-tuning
The model's weights are adjusted using labeled examples. The model learns a style, a structure, a domain vocabulary. It does not learn new facts in any reliable way — that's a common misconception.
Agentic orchestration
Multiple LLM calls, possibly with multiple specialized models, coordinated through a planner/executor loop. The system can call tools, retrieve documents, evaluate intermediate results, and decide next steps.
The decision matrix
| Dimension | RAG | Fine-Tune | Agentic |
|---|---|---|---|
| Best for | Knowledge that changes | Style/structure that's stable | Multi-step workflows |
| Time to value | Days | Weeks | Months |
| Per-query cost | Medium-High | Low | High |
| Latency | 200-800ms | 100-300ms | 2-30s |
| Auditability | High (cite sources) | Low (opaque weights) | Medium (trace steps) |
| Drift exposure | Low | High | High |
| Team skill required | Mid | High (ML) | High (eng + AI) |
Decision tree
Start with these questions in order:
Q1: Does the task require knowledge that changes?
If yes, RAG is mandatory. Fine-tuning fixed knowledge into model weights creates a maintenance burden and a drift risk you don't want.
Q2: Does the output need consistent structure or domain-specific style?
If yes, consider fine-tuning on top of RAG. Use RAG for facts. Use fine-tuning for tone, format, and brand voice. This is the highest-leverage combination for most enterprise use cases.
Q3: Is the workflow genuinely multi-step?
"Multi-step" means: the system must take an action based on the result of a previous action, with branching logic, possibly across days. If yes, agentic. If no — if it's request → response — agents are overkill and will hurt your latency.
Q4: Does the workflow involve real-world tools or APIs?
If yes, agents start to earn their complexity. The whole point of an agent is to call tools, observe results, and react. If you're not calling tools, you don't need an agent.
Q5: Is governance a hard constraint?
If yes, RAG with strong citation is your friend. Fine-tuning is opaque. Agents are traceable but their reasoning loops can be hard to audit. RAG with citations is the most defensible architecture in regulated industries.
Common patterns we see in production
Pattern A: Customer-facing knowledge agent
Approach: RAG-only with a small fine-tuned response style.
Why: Knowledge changes, governance matters, latency matters. You want to be able to cite sources and audit answers. Fine-tuning is just for tone.
Pattern B: Internal back-office automation
Approach: Agentic orchestration with RAG inside.
Why: Multi-step. Touches multiple systems. Latency tolerance is higher (it's an automation, not a chat). The agent handles flow control; RAG handles each knowledge step.
Pattern C: Domain-specific document generation
Approach: Fine-tuned base model + RAG for facts + structured output.
Why: Output structure must be consistent (legal docs, medical reports, financial summaries). Fine-tuning teaches the structure. RAG provides current facts. Structured output ensures schema compliance.
Pattern D: Real-time intelligence augmentation
Approach: RAG over operational data + LLM for synthesis. No agent, no fine-tune.
Why: Latency-sensitive. Single-shot. The LLM is essentially a sophisticated query renderer. Keep it simple.
What people get wrong
Mistake 1: Fine-tuning to teach facts
Fine-tuning is unreliable for fact injection. The model may regurgitate the fact in a few cases and forget it in others. RAG is the right answer.
Mistake 2: Agentic for everything
Agents have become fashionable. They are slow, expensive, and harder to debug than single-shot LLM calls. Use them when you need to. Otherwise don't.
Mistake 3: RAG without re-ranking
Vanilla vector search retrieves "semantically similar" content, which is often not "actually relevant" content. A re-ranking layer (often a small fine-tuned model or a cross-encoder) lifts retrieval quality dramatically.
Mistake 4: No eval at decision time
Choosing between RAG, fine-tune, and agentic without an eval framework means you're choosing on vibes. Build a 100-example test set per use case before you commit. Then measure.
Where the field is heading
Three trends to plan for over the next 18 months:
- Long-context models are reducing the need for sophisticated RAG in some cases. If your full corpus fits in the context window, you can sometimes skip the retrieval layer entirely. Trade-off: cost.
- Smaller specialist models are getting good enough for narrow tasks. Triage classifiers, format validators, and domain-specific summarizers can run on smaller, cheaper, lower-latency models — saving the frontier model for the moments that need it.
- Agentic frameworks are converging. LangGraph, OpenAI's Assistants, Anthropic's tool use — the abstractions are stabilizing. Building agents in 2026 is meaningfully easier than in 2024.
The simple advice
For most enterprise GenAI deployments, start with the simplest architecture that solves the problem. Add complexity only when you can articulate, in writing, why the complexity earns its keep. Most teams over-architect and end up with systems they can't operate.
If you're trying to make this decision for a specific use case and want a second opinion, book 30 minutes with our CTO. We've made these calls enough times to give you a defensible answer in one conversation.