Most AI investment cases I see brought to a board are weak. Not because the AI doesn't work — but because the math is sloppy. The vendor quotes a model API price. The buyer extrapolates. Six months later the actual bill is 3x the estimate, and the CFO is the one explaining it to the audit committee.
Here is the TCO model we use. It is the model we put in front of our own CFO. It is the model we share with clients before they sign — not after.
The six cost lines
1. Model inference
This is the line every vendor quotes. Per-token pricing, multiplied by expected volume. Easy to estimate, easy to underestimate. The mistakes here are: forgetting that prompts are also tokens (sometimes the bulk of them), forgetting that RAG injects retrieved context, forgetting that retries on truncation double the cost.
Our rule: take your initial estimate, multiply by 1.8x. That's your year-one budget line.
2. Embedding + vector storage
If you're doing RAG (and you probably are), every document gets embedded — and re-embedded when you swap embedding models. Storage of vectors at scale (typical enterprise: 10-50M vectors) is non-trivial. Add the operational cost of the vector DB itself.
3. Orchestration + tooling
The framework you use (LangChain, LangGraph, LlamaIndex, or in-house). The tool integrations. The prompt management system. The eval harness. The observability stack. Each of these is either a SaaS subscription, an engineering build, or both.
4. Compute for self-hosted components
Even cloud-API-first deployments end up running things locally — embedding models, classifiers, smaller specialist models for triage. GPU compute, even for inference, adds a meaningful line.
5. People
The biggest line item, almost always understated. A production GenAI system needs: an ML engineer for the model layer, a backend engineer for the orchestration, a data engineer for the corpus, a domain expert for the eval, an SRE for the operational layer, and a compliance partner for the governance. You're not getting away with one full-stack developer.
6. Risk + insurance
Increasingly, regulated industries need cyber-AI insurance riders. Audits cost real money. Incident response costs real money. Failed audits cost catastrophic money. Budget 5-10% of the technology line.
Worked example: 100-seat enterprise copilot
Let's run the numbers for a fictional but typical deployment — an internal knowledge copilot for 100 mid-market enterprise users. ~5,000 queries per user per year. RAG over a 50K-document corpus.
| Cost line | Annual estimate |
|---|---|
| Model inference (5M queries × 4K avg tokens × $0.005/1K) | $100,000 |
| Embedding + vector storage (50K docs · re-embeds quarterly) | $24,000 |
| Orchestration platform + observability tooling | $48,000 |
| Self-hosted compute (classifiers, embedding model) | $36,000 |
| People (1 ML eng, 1 backend, 0.5 SRE, 0.25 compliance) | $420,000 |
| Risk + audit + insurance | $32,000 |
| Total Year 1 TCO | $660,000 |
Most vendors will quote you the first line ($100K) and call it the cost. The actual cost is 6.6x higher. If you go to your board with $100K and come back asking for $660K, you've burned your credibility.
The ROI conversation
The TCO is one side. The savings/revenue is the other. For the same hypothetical 100-seat copilot:
- If each user saves 1 hour per week (conservative), at $80/hr fully-loaded — that's $416K/year/user, or $416K total annual productivity savings. Net of TCO: a loss.
- If each user saves 3 hours per week (typical for high-knowledge-work users), savings rise to $1.25M/year. Net of TCO: ~$590K positive.
- If the copilot also reduces external consulting spend by 5% (because the team can do more in-house) — add another $150-300K/year.
The point is not the exact number. The point is to put real numbers, with stated assumptions, with stated sensitivities, in front of the CFO. That's what defends the budget.
The hidden costs that wreck deployments
Drift in the wild
Model providers update their models. Your evaluation degrades silently. You discover it at a customer escalation. The remediation cost (re-evaluation, prompt updates, possibly a rollback) is rarely in the original budget.
Token inflation
Year over year, RAG context windows grow as you add more documents. Conversation lengths grow as users get more sophisticated. Year-2 cost-per-conversation is typically 30-50% higher than year-1, even with the same model.
The retraining trap
If you fine-tuned a model, every base-model update means re-fine-tuning. Plan for this — quarterly cycles are realistic.
Incident response
One bad hallucination at the wrong customer can cost more than a year of model spend. Real incident-response cost includes communication, remediation, regulatory disclosure, and the engineering time to root-cause.
What to take to your CFO
- A six-line TCO with stated assumptions per line and a sensitivity table.
- An ROI model with three scenarios (conservative, base, optimistic) and the hours-saved or revenue-driven assumption per user.
- A risk register with the three highest-cost failure modes and their mitigations.
- A 3-year forecast that bakes in token inflation and model upgrade cycles.
- A go/no-go threshold — what business outcome do we need to see in month 6 to keep funding?
Walk in with that and you'll get a rational conversation. Walk in without it and you'll get a "let's revisit next quarter."
If you'd like a template version of this TCO model — pre-built for an Excel review — drop us a line. We'll send you the worksheet we use internally, no strings.