Most AI investment cases I see brought to a board are weak. Not because the AI doesn't work — but because the math is sloppy. The vendor quotes a model API price. The buyer extrapolates. Six months later the actual bill is 3x the estimate, and the CFO is the one explaining it to the audit committee.

Here is the TCO model we use. It is the model we put in front of our own CFO. It is the model we share with clients before they sign — not after.

Rule of thumb — A serious GenAI workload has six cost lines. Vendors will quote you one. Don't sign anything until you've modeled all six.

The six cost lines

1. Model inference

This is the line every vendor quotes. Per-token pricing, multiplied by expected volume. Easy to estimate, easy to underestimate. The mistakes here are: forgetting that prompts are also tokens (sometimes the bulk of them), forgetting that RAG injects retrieved context, forgetting that retries on truncation double the cost.

Our rule: take your initial estimate, multiply by 1.8x. That's your year-one budget line.

2. Embedding + vector storage

If you're doing RAG (and you probably are), every document gets embedded — and re-embedded when you swap embedding models. Storage of vectors at scale (typical enterprise: 10-50M vectors) is non-trivial. Add the operational cost of the vector DB itself.

3. Orchestration + tooling

The framework you use (LangChain, LangGraph, LlamaIndex, or in-house). The tool integrations. The prompt management system. The eval harness. The observability stack. Each of these is either a SaaS subscription, an engineering build, or both.

4. Compute for self-hosted components

Even cloud-API-first deployments end up running things locally — embedding models, classifiers, smaller specialist models for triage. GPU compute, even for inference, adds a meaningful line.

5. People

The biggest line item, almost always understated. A production GenAI system needs: an ML engineer for the model layer, a backend engineer for the orchestration, a data engineer for the corpus, a domain expert for the eval, an SRE for the operational layer, and a compliance partner for the governance. You're not getting away with one full-stack developer.

6. Risk + insurance

Increasingly, regulated industries need cyber-AI insurance riders. Audits cost real money. Incident response costs real money. Failed audits cost catastrophic money. Budget 5-10% of the technology line.

Worked example: 100-seat enterprise copilot

Let's run the numbers for a fictional but typical deployment — an internal knowledge copilot for 100 mid-market enterprise users. ~5,000 queries per user per year. RAG over a 50K-document corpus.

Cost lineAnnual estimate
Model inference (5M queries × 4K avg tokens × $0.005/1K)$100,000
Embedding + vector storage (50K docs · re-embeds quarterly)$24,000
Orchestration platform + observability tooling$48,000
Self-hosted compute (classifiers, embedding model)$36,000
People (1 ML eng, 1 backend, 0.5 SRE, 0.25 compliance)$420,000
Risk + audit + insurance$32,000
Total Year 1 TCO$660,000

Most vendors will quote you the first line ($100K) and call it the cost. The actual cost is 6.6x higher. If you go to your board with $100K and come back asking for $660K, you've burned your credibility.

The ROI conversation

The TCO is one side. The savings/revenue is the other. For the same hypothetical 100-seat copilot:

  • If each user saves 1 hour per week (conservative), at $80/hr fully-loaded — that's $416K/year/user, or $416K total annual productivity savings. Net of TCO: a loss.
  • If each user saves 3 hours per week (typical for high-knowledge-work users), savings rise to $1.25M/year. Net of TCO: ~$590K positive.
  • If the copilot also reduces external consulting spend by 5% (because the team can do more in-house) — add another $150-300K/year.

The point is not the exact number. The point is to put real numbers, with stated assumptions, with stated sensitivities, in front of the CFO. That's what defends the budget.

The hidden costs that wreck deployments

Drift in the wild

Model providers update their models. Your evaluation degrades silently. You discover it at a customer escalation. The remediation cost (re-evaluation, prompt updates, possibly a rollback) is rarely in the original budget.

Token inflation

Year over year, RAG context windows grow as you add more documents. Conversation lengths grow as users get more sophisticated. Year-2 cost-per-conversation is typically 30-50% higher than year-1, even with the same model.

The retraining trap

If you fine-tuned a model, every base-model update means re-fine-tuning. Plan for this — quarterly cycles are realistic.

Incident response

One bad hallucination at the wrong customer can cost more than a year of model spend. Real incident-response cost includes communication, remediation, regulatory disclosure, and the engineering time to root-cause.

What to take to your CFO

  1. A six-line TCO with stated assumptions per line and a sensitivity table.
  2. An ROI model with three scenarios (conservative, base, optimistic) and the hours-saved or revenue-driven assumption per user.
  3. A risk register with the three highest-cost failure modes and their mitigations.
  4. A 3-year forecast that bakes in token inflation and model upgrade cycles.
  5. A go/no-go threshold — what business outcome do we need to see in month 6 to keep funding?

Walk in with that and you'll get a rational conversation. Walk in without it and you'll get a "let's revisit next quarter."

If you'd like a template version of this TCO model — pre-built for an Excel review — drop us a line. We'll send you the worksheet we use internally, no strings.