Price a voice agent before you build it.

Compare STT, TTS, and LLM providers at your real call volume. See cost per minute, cost per call, and where the money goes.

  • No signup
  • Edit every rate
  • Includes hosting and latency

01 Providers

Pick your stack

02 Volume and call shape

Tell us what a call looks like

Caller speaks moreAgent speaks more

03 Rate overrides

Override any rate

Defaults load from the provider you picked. Edit any line to model a custom contract or volume discount.

04 Result

Your cost breakdown

Per minute

$0.0166

Per call

$0.083

Per month

$415

Speech-to-text36.1%
LLM36.7%
Text-to-speech27.1%
Hosting0.1%
Speech-to-text$0.0300
LLM$0.0305
Text-to-speech$0.0225
Hosting$0.0001
Per call$0.0831

Tokens per call

Input

10,238

Output

488

Total

10,726

Input tokens grow with every turn because conversation history is replayed each time.

Planning estimate. Real bills shift with retries, volume discounts, and silence trimming.

05 Latency

Where the milliseconds go

Voice-to-voice latency is the only number your caller feels. Tune each stage to see what moves the total and what your stack needs to stay under 500ms.

Voice-to-voice latency

825msSluggish
0ms1000ms
Input path
Input
135ms
16%
AI processing
AI processing
555ms
67%
Output path
Output
135ms
16%
Latency guide:Natural ≤ 200msAcceptable 201–500msSluggish > 500ms

06 Stage timings

Per-stage breakdown

Input path

ms
ms
ms
ms
ms
ms

AI processing

ms
ms
ms
ms

Output path

ms
ms
ms
ms
ms
ms

07 Methodology

How the numbers are built

Every line above is auditable. Below is the math, in the order the calculator runs it.

A typical voice agent costs $0.05 to $0.30 per minute, driven by three lines: speech-to-text (STT), the language model (LLM), and text-to-speech (TTS). The LLM is usually the largest line because input tokens grow with conversation history. Hosting adds a small per-minute charge that drops as concurrency per vCPU rises.

Speech-to-text

Provider rate times minutes of audio processed in the call.

STT cost = call length (min) × rate per minute

Text-to-speech

Characters the agent generates times the per-character rate.

TTS cost = words × characters/word × agent speech share × rate per character

LLM input

Conversation history grows each turn, so input tokens scale quadratically with call length.

Input tokens = (words/min × tokens/word ÷ turns/min) × (turns/min × length) × (turns/min × length + 1) ÷ 2

LLM output

Output grows linearly with how much the agent speaks.

Output tokens = words/min × tokens/word × agent speech share × call length

LLM cost

Input and output tokens billed at their respective rates.

LLM cost = input tokens × input rate + output tokens × output rate

Hosting

vCPU minute rate divided by the agents you can run on each vCPU.

Hosting cost = (vCPU rate × call length) ÷ agents per vCPU

Cost per call

Everything a single call burns across all four lines.

Per call = STT + LLM + TTS + hosting

Monthly cost

Per-call cost scaled by monthly volume, plus any fixed hosting.

Monthly = per call × monthly calls + fixed hosting

08 Latency

Total voice-to-voice latency

End-to-end delay from the caller speaking to the agent voice arriving back. Every stage in the latency panel adds to this total.

Total = mic + opus encode + network + packet + jitter + opus decode + transcription + LLM + sentence aggregation + TTS + opus encode + packet + network + jitter + opus decode + speaker

How it works

Pulls current per-minute (STT), per-character (TTS), and per-token (LLM) rates from major providers, then scales them against call volume, conversation length, words per minute, and turn rate. Input tokens use a quadratic growth model that mirrors how conversation history accumulates each turn. Hosting cost is per-vCPU-minute divided by concurrent agents.

Need a real number for your stack?

A 30-minute call with a RaftLabs founder turns this estimate into a build plan: provider shortlist, latency budget, and a fixed-price scope.

Voice AI cost questions

Sizing a voice agent rollout, validated against real production deployments.

The calculator uses published per-minute, per-character, and per-token rates from each provider and applies industry-standard call shape assumptions. Real bills shift with volume discounts, custom contracts, retries, silence trimming, and barge-in handling. Treat the output as a planning range, not a quote.
Four lines: speech-to-text per minute, LLM tokens (input plus output), text-to-speech per character, and hosting per vCPU minute. Conversation length is the biggest multiplier because input tokens grow with history. Concurrency per vCPU controls hosting cost per call.
Under 200ms voice-to-voice feels natural. 200 to 500ms is acceptable for transactional calls. Over 500ms feels slow and callers start talking over the agent. The latency panel below shows where time is spent across input, AI processing, and output paths.
Not yet. The calculator runs in your browser and resets on reload. Screenshot the breakdown or copy the inputs into your own sheet for now.
Rates are hardcoded against published pricing at the time of last update. Always verify the live rate with each provider before committing budget. Every input is editable, so you can override any rate that has moved.
Input tokens are everything the model reads: system prompt, tools, conversation history, and the new user turn. Output tokens are what the model writes back. Input grows quadratically with conversation length because history is replayed each turn; output grows linearly.
Lead with the constraint: cost, latency, voice quality, or compliance. For cost, pair Deepgram Nova or Whisper for STT with Flash V2.5 or Aura-2 for TTS and GPT-4o Mini or Llama for the LLM. For latency, pick the fastest STT and TTS even at a higher rate. For voice quality, ElevenLabs Multilingual carries the most expressive output.
Concurrent voice sessions a single vCPU can serve before performance degrades. Higher concurrency cuts hosting cost per call but raises latency variance under load. Most production stacks run 4 to 16 agents per vCPU depending on STT and barge-in load.