Healthcare clients ask us a variant of the same question every week: "Can we use GenAI without violating HIPAA?" The answer is yes — but the path is narrow, the documentation burden is real, and most off-the-shelf vendor stacks won't get you there.

This is the architecture we deploy in healthcare environments — including for clients like Almoosa Health Group in KSA, where we built the patient-facing platform handling appointments, digital records, video consults, and wearable integration at scale. The patterns generalize. The compliance regime changes (HIPAA in the US, GDPR + national laws in EU, NPHIES in KSA) but the architecture moves are the same.

The core principle — Treat PHI like radioactive material. Move it as little as possible. When you must move it, contain it. When you must process it with an LLM, do so on infrastructure where you control every line of the chain — from network to disk to memory.

Why most GenAI stacks aren't HIPAA-safe out of the box

The default GenAI stack — frontier model API + cloud vector DB + SaaS orchestration — fails at HIPAA in three places:

  1. Model providers. Many model providers will not sign a Business Associate Agreement (BAA), or will sign one only with specific products under specific configurations. Default API access usually isn't BAA-covered.
  2. Logging and telemetry. Most observability stacks log full prompts and responses. If those contain PHI, your telemetry is now a HIPAA-covered system that wasn't designed to be one.
  3. Vector storage. Embedding PHI into a third-party vector DB without a BAA, or without proper encryption and access control, is a covered-data leak.

The five-layer HIPAA-safe GenAI architecture

Layer 1: PHI Detection & Redaction (input)

Every input to the GenAI system passes through a PHI detection layer first. We run an on-premise NER model fine-tuned for the 18 HIPAA identifiers (names, dates, geographic data smaller than state, phone numbers, fax, email, SSN, MRN, account, certificate, vehicle, device, URL, IP, biometric, photo, etc.).

Detected PHI gets one of three treatments: (a) redacted to a placeholder before reaching the LLM, (b) tokenized to a reversible reference that the post-processor can re-hydrate, or (c) blocked entirely with a "this conversation requires a human" route.

Layer 2: BAA-covered LLM

Use only LLMs from providers that will sign a BAA, and only the specific deployment modes covered by that BAA. As of 2026 this includes: Microsoft Azure OpenAI Service (with BAA + Healthcare Data Solutions), Google Vertex AI (with BAA + healthcare configurations), AWS Bedrock (with BAA), and Anthropic via certain enterprise programs. Always verify the current state with the provider's compliance team — this list changes.

For the highest-sensitivity workloads, consider self-hosted open-source models (Llama, Mistral) running on your own HIPAA-compliant infrastructure. Trade-off is model quality vs. data isolation.

Layer 3: Scoped, encrypted vector store

The embedding store lives inside your HIPAA boundary — not in a third-party SaaS. Postgres with pgvector + at-rest encryption + row-level security. Or a managed vector DB only if it's covered by your BAA. Embeddings of PHI are themselves PHI for the purposes of the regulation — treat them accordingly.

Layer 4: Output validator + audit log

Every LLM response passes through an output validator that re-checks for accidental PHI leakage (the LLM has a habit of repeating things back) and validates against a deny-list of unacceptable output patterns (medical diagnoses without confidence statements, prescriptions, etc.).

The audit log captures: input (with PHI redacted in a separate, restricted log if needed), retrieved sources, model used, output, validator decisions, timestamp, user identity. Retention follows your regulatory schedule — typically 6 years minimum under HIPAA.

Layer 5: Human-in-the-loop for clinical decisions

For anything touching clinical decision support, the LLM is a draft generator — not the decision maker. A clinician approves, edits, or rejects. This is non-negotiable. The LLM can speed up clinical documentation, summarization, scheduling, billing — but it doesn't make clinical decisions in our deployments, full stop.

What we deploy at Almoosa-class clients

The patient-facing platform for one of KSA's most respected health groups runs:

  • Patient appointments and scheduling — GenAI assists in finding optimal slot suggestions, but does not autonomously book appointments touching multi-physician resources.
  • Digital records access — Patients query their own records via natural language. The LLM has scoped access, never crosses patient boundaries, every query is logged.
  • Video consult prep summaries — Pre-consultation, the system generates a summary of the patient's recent records for the physician. Physician reviews before the consult.
  • Wearable data integration — Continuous data ingestion. AI surfaces anomalies for nurse triage, never auto-alerts patients on clinical concerns.
  • Bilingual (Arabic/English) support — All layers operate bilingually with culturally appropriate phrasing.

Documentation an HHS audit will ask for

  1. BAAs with every covered third party in the chain (model provider, infrastructure, observability, etc.).
  2. System architecture diagram showing PHI flow boundaries.
  3. Risk assessment with explicit treatment of PHI exposure scenarios and their mitigations.
  4. Audit log access procedures with role-based access.
  5. Incident response plan for any suspected breach.
  6. Workforce training records on HIPAA & AI-specific risks.
  7. Validation reports for the PHI detection model showing false-negative rates by identifier type.
  8. Penetration test results focused on the AI surfaces.

Common failure modes to avoid

The "shadow IT" deployment

A clinician uses ChatGPT (consumer) to summarize patient notes. PHI in the prompt. No BAA. Breach. This happens constantly. The mitigation is policy + tooling + a sanctioned alternative the staff actually wants to use.

The "we'll add HIPAA later" deployment

A team builds an AI feature in a non-HIPAA environment with the intent to "harden it" before production. The hardening always takes 5x longer than estimated and frequently requires rebuilding the data layer. Build inside the boundary from day one.

The over-trust failure

An LLM correctly cites a source, but the source itself is outdated or wrong. The clinician trusts the citation. Patient is harmed. Mitigation: corpus governance — every retrievable source has an effective date, an authority, and a periodic-review schedule.

The frontier moves fast — your boundaries should not

Models will get better. Cost will fall. New providers will sign BAAs. Use those advances. But the architectural boundaries — what touches PHI, what doesn't, where the audit trail lives, who's accountable — should not move with each new model release. Hold those constant. Iterate on what's inside.

If you're scoping a HIPAA-bounded GenAI deployment and want a second pair of eyes, book 30 minutes with our CTO. We've shipped the architecture above multiple times — and we'd rather help you avoid the failure modes than watch you discover them.