When the SaaS is the right answer, and when you need the architecture around the agent — not just the agent itself.
We use Vapi as part of our stack on roughly half of voice engagements. This isn’t a hit piece — it’s the framework we use when scoping with a buyer.
TL;DR
Buy Vapi when you need a working voice agent fast, your flow is simple, and you have an engineer to maintain it.
Build with Shakan when you need versioned evals, deep AU integrations (HotDoc, Tyro Health, Salestrekker), stateful LangGraph orchestration across CRM and calendar, and the freedom to swap voice vendors without rewriting your business logic.
| Dimension | Vapi | Shakan AI |
|---|---|---|
| Setup cost | Self-serve sign-up, free tier; minutes to a working demo | $20K+ implementation, 4–10 week build with scoping and evals |
| Monthly cost | Per-minute usage pricing (SaaS); predictable at low volume | $3K+ MRR retainer covering ops, eval runs, model upgrades |
| Time-to-value | Hours to days for a prototype; weeks for a production rollout | Phase 1 live in 3–4 weeks; full system in 6–10 weeks |
| IP ownership | You configure prompts and flows; Vapi owns the runtime | You own the code, prompts, evals, infrastructure and source escrow |
| Customisation depth | Strong inside Vapi's flow model; limited beyond it | Arbitrary state machines (LangGraph), custom tools, bespoke logic |
| Observability | Built-in call logs and transcripts; basic analytics | LangSmith + OpenTelemetry tracing, p50/p95/p99, cost-per-call dashboards |
| Evals & guardrails | Manual review of recordings; no native eval harness | Versioned golden datasets, regression suites in CI, refusal handling |
| Vendor lock-in | Tightly coupled to Vapi's runtime and pricing | Portable: Retell, Vapi, ElevenLabs, Deepgram swappable behind your interface |
| Multi-system orchestration | Outbound webhooks; downstream logic lives elsewhere | First-class CRM, calendar, billing and post-call workflow orchestration |
| AU compliance & integrations | Generic webhooks; HotDoc, Tyro Health, Cliniko not natively wired | HotDoc, Tyro Health, Cliniko, HubSpot AU, AHPRA-aware refusal patterns built in |
| Who builds it | Your team configures Vapi's UI | Senior engineer ships the system end-to-end |
| What happens at scale | Per-minute pricing dominates; flow complexity hits a ceiling | Architecture absorbs scale; cost-per-call tuned via model routing and fallback chains |
On roughly half our voice engagements, Vapi is the voice layer and Shakan owns the architecture around it: a LangGraph state machine that handles intent routing, tool selection and CRM writes; an eval harness in CI; observability via LangSmith and OpenTelemetry; and AU-specific integrations with HotDoc, Tyro Health, Cliniko, HubSpot AU and Salestrekker.
Vapi handles what it’s good at — speech I/O, telephony, fast iteration on conversational flow. Shakan handles what enterprise voice systems actually need to ship: versioned prompts, regression tests, refusal handling, fallback chains (Sonnet 4.6 → Haiku 4.5 → static), and post-call workflows that move data into the systems your operators actually use.
Shakan engagements start at $20K+ for implementation (typically 4–10 weeks) and $3K+ MRR for ongoing operations, eval runs and model upgrades.
Vapi’s SaaS pricing is per-minute and dominates the math at low volumes — it’s the obvious choice if you take fewer than ~500 calls/month and your flow is simple. The crossover happens when complexity, volume or integration depth makes the engineering time cheaper than the platform overhead.
Yes — and Shakan engagements are designed for it. We treat the voice provider (Vapi, Retell, ElevenLabs, Deepgram) as a swappable layer behind a typed interface. Your LangGraph state machine, your tools, your evals and your CRM logic stay put when you change vendors. We've done lift-and-shift work for clients who outgrew Vapi's pricing model at scale.
A versioned golden dataset of 50–500 real call transcripts per intent, scored on a rubric that mixes deterministic checks (tool selection, schema validity, escalation triggers) with model-graded checks (tone, helpfulness, factual grounding). The same suite runs in CI, in staging and on sampled production traffic. Vapi gives you call recordings and basic analytics — useful, but not a regression test.
Yes, on roughly half of voice engagements. When Vapi is the right fit for the workload — fast prototyping, simple flows, low-to-moderate volume — we orchestrate around it rather than reinventing it. Where Vapi's flow model can't express the workflow, we drop to a custom LangGraph runtime calling Deepgram and ElevenLabs directly.
For a healthcare practice taking ~2,000 calls/month at five minutes average duration, Vapi's per-minute pricing typically lands in the low-to-mid four figures monthly. Once you add the engineering time to wire up HotDoc, Tyro Health, the CRM and post-call workflows, total cost-of-ownership often exceeds a Shakan retainer. The crossover depends on volume, complexity and how much your own team can absorb.
Native integrations with HotDoc, Tyro Health, Cliniko, Salestrekker and HubSpot AU; AHPRA-aware refusal patterns for clinical contexts; AUSTRAC-aware triggers for financial-services callers; data residency on AU regions; and a familiarity with how AU practices actually run reception and intake. Vapi doesn't ship those out of the box — and shouldn't, because they're vertical-specific.
45 minutes with a senior architect. We’ll tell you honestly whether Vapi alone is enough — or whether the architecture around it justifies a Shakan engagement.