Conversation Runtime
Deterministic dialog execution engine with finite state machines, turn detection, and optional LLM hooks.
The Conversation Runtime is the core differentiator of voicetyped. It is not a chatbot builder — it is a deterministic runtime for executing voice dialog flows. Dialogs are defined as finite state machines (FSMs) that process events (speech, DTMF, timeouts, backend results) and produce actions (play TTS, transfer, hang up, call hooks).
Why a State Machine?
Most voice automation platforms use either:
- Scripted flows — rigid, hard to maintain
- LLM-driven conversations — unpredictable, hard to audit, slow
voicetyped uses a finite state machine because:
- Deterministic — the same input always produces the same output
- Auditable — every state transition is logged
- Fast — no LLM inference latency on the critical path
- Reliable — no hallucinations, no unexpected behavior
- Serializable — call state survives restarts
- LLM-optional — add LLM nodes where you actually need them
Configuration
# /etc/voice-gateway/config.yaml — runtime section
runtime:
dialog_dir: /etc/voice-gateway/dialogs/ # Directory containing dialog YAML files
default_timeout: 10s # Default timeout per state
max_concurrent_calls: 100 # Maximum simultaneous calls
max_dialog_depth: 50 # Maximum state transitions per call
state_store: memory # memory, redis, postgres
barge_in: true # Allow caller to interrupt TTS
Dialog Definition
Dialogs are defined in YAML files in the dialog_dir directory. Each file defines one dialog flow.
Basic Structure
# /etc/voice-gateway/dialogs/helpdesk.yaml
name: helpdesk
description: IT helpdesk intake flow
version: "1.0"
# Variables available throughout the dialog
variables:
caller_name: ""
issue_type: ""
ticket_id: ""
# Routing: which calls use this dialog
routing:
match:
- sip_to: "sip:helpdesk@*"
- sip_to: "sip:+18001234567@*"
states:
# Initial state — every dialog must have a 'start' state
start:
on_enter:
- action: play_tts
text: >
Thank you for calling IT support.
Please briefly describe your issue.
transitions:
- event: speech
target: classify_issue
- event: dtmf
digits: "0"
target: transfer_to_human
- event: timeout
after: 15s
target: no_input
classify_issue:
on_enter:
- action: call_hook
service: issue_classifier
method: Classify
payload:
transcript: "{{ .Event.Transcript }}"
transitions:
- event: hook_result
condition: "{{ .Result.Category == 'password_reset' }}"
target: password_reset
- event: hook_result
condition: "{{ .Result.Category == 'hardware' }}"
target: hardware_issue
- event: hook_result
target: general_issue
- event: hook_error
target: fallback
password_reset:
on_enter:
- action: set_variable
name: issue_type
value: password_reset
- action: play_tts
text: >
I understand you need a password reset.
Let me create a ticket for you.
- action: call_hook
service: ticketing
method: CreateTicket
payload:
type: password_reset
caller: "{{ .Call.CallerID }}"
transitions:
- event: hook_result
target: ticket_created
- event: hook_error
target: fallback
hardware_issue:
on_enter:
- action: set_variable
name: issue_type
value: hardware
- action: play_tts
text: >
For hardware issues, I will transfer you
to our on-site support team.
transitions:
- event: tts_complete
target: transfer_to_hardware
general_issue:
on_enter:
- action: play_tts
text: >
I have noted your issue. A ticket has been created
and a support engineer will contact you shortly.
- action: call_hook
service: ticketing
method: CreateTicket
payload:
type: general
caller: "{{ .Call.CallerID }}"
transcript: "{{ .Event.Transcript }}"
transitions:
- event: hook_result
target: ticket_created
- event: hook_error
target: fallback
ticket_created:
on_enter:
- action: set_variable
name: ticket_id
value: "{{ .Result.TicketID }}"
- action: play_tts
text: >
Your ticket number is {{ .Variables.ticket_id }}.
Is there anything else I can help you with?
transitions:
- event: speech
condition: "{{ contains .Event.Transcript 'yes' }}"
target: start
- event: speech
target: goodbye
- event: timeout
after: 10s
target: goodbye
transfer_to_human:
on_enter:
- action: play_tts
text: "Transferring you to a human agent. Please hold."
- action: transfer
target: "sip:support-queue@pbx.internal"
transfer_to_hardware:
on_enter:
- action: transfer
target: "sip:hardware-team@pbx.internal"
no_input:
on_enter:
- action: play_tts
text: "I did not hear anything. Let me try again."
transitions:
- event: tts_complete
target: start
fallback:
on_enter:
- action: play_tts
text: >
I am having trouble processing your request.
Let me transfer you to a human agent.
- action: transfer
target: "sip:support-queue@pbx.internal"
goodbye:
on_enter:
- action: play_tts
text: "Thank you for calling IT support. Goodbye."
- action: hangup
Events
The runtime processes these event types:
| Event | Source | Description |
|---|---|---|
speech | Speech Gateway | Caller said something (transcript available) |
dtmf | Media Gateway | Caller pressed a key |
timeout | Runtime clock | No event received within the configured time |
hook_result | Integration Gateway | Backend service responded |
hook_error | Integration Gateway | Backend service failed |
tts_complete | Speech Gateway | TTS playback finished |
call_started | Media Gateway | Call was connected |
call_terminated | Media Gateway | Call was ended (by either party) |
Event Data
Each event carries contextual data accessible in templates:
# Speech event
.Event.Transcript # Full transcript text
.Event.Confidence # Confidence score (0.0–1.0)
.Event.Language # Detected language
.Event.DurationMs # Speech duration in milliseconds
# DTMF event
.Event.Digit # The digit pressed (0–9, *, #)
.Event.DurationMs # Key press duration
# Hook result event
.Result # The full response object from the backend
.Result.FieldName # Access specific fields
# Call context (always available)
.Call.SessionID # Unique call identifier
.Call.CallerID # Caller phone number
.Call.CalledNumber # Dialed number
.Call.StartTime # Call start timestamp
.Variables # User-defined variables
Actions
Actions are executed when entering a state or during transitions:
play_tts
Renders text to speech and plays it to the caller:
- action: play_tts
text: "Hello, how can I help you?"
voice: en_US-amy-medium # Optional: override default voice
speed: 1.0 # Optional: playback speed
barge_in: true # Optional: allow caller to interrupt
call_hook
Calls a customer backend service via the Integration Gateway:
- action: call_hook
service: ticketing # Registered service name
method: CreateTicket # HTTP endpoint path
payload: # Data to send
type: "{{ .Variables.issue_type }}"
caller: "{{ .Call.CallerID }}"
timeout: 5s # Optional: override default timeout
transfer
Transfers the call to another SIP endpoint:
- action: transfer
target: "sip:queue@pbx.internal"
headers: # Optional: custom SIP headers
X-Transfer-Reason: "escalation"
hangup
Terminates the call:
- action: hangup
reason: normal # normal, busy, rejected
set_variable
Sets a dialog variable:
- action: set_variable
name: issue_type
value: "{{ .Result.Category }}"
play_audio
Plays a pre-recorded audio file:
- action: play_audio
file: /var/lib/voice-gateway/audio/hold-music.wav
loop: true # Optional: loop playback
Turn Detection
Turn detection determines when the caller has finished speaking and it is the system’s turn to respond. voicetyped uses a combination of:
- Voice Activity Detection (VAD) — detects silence after speech
- Endpoint detection — confirms the utterance is complete
- Barge-in handling — allows the caller to interrupt TTS playback
Barge-In
When barge_in is enabled (default), the caller can interrupt TTS playback by speaking:
System: "Thank you for calling IT support. Our hours are—"
Caller: "I need a password reset" ← Barge-in
System: [stops TTS, processes speech]
This is controlled globally or per-action:
runtime:
barge_in: true # Global default
# Or per-action:
- action: play_tts
text: "Important disclaimer..."
barge_in: false # Don't allow interruption
DTMF Menus
Build traditional IVR menus with DTMF:
states:
main_menu:
on_enter:
- action: play_tts
text: >
Press 1 for billing.
Press 2 for technical support.
Press 3 for account information.
Press 0 to speak with an agent.
transitions:
- event: dtmf
digits: "1"
target: billing
- event: dtmf
digits: "2"
target: tech_support
- event: dtmf
digits: "3"
target: account_info
- event: dtmf
digits: "0"
target: transfer_agent
- event: timeout
after: 10s
target: main_menu # Repeat
Multi-digit DTMF
Collect multi-digit input (e.g., account numbers):
states:
collect_account:
on_enter:
- action: play_tts
text: "Please enter your account number followed by the pound key."
transitions:
- event: dtmf
digits: "*#" # Terminated by #
min_digits: 6
max_digits: 12
inter_digit_timeout: 3s
target: verify_account
Optional LLM Nodes
For states that need natural language understanding beyond keyword matching, you can add LLM nodes:
states:
understand_request:
on_enter:
- action: call_hook
service: llm_service
method: Classify
payload:
prompt: >
Classify the following customer request into one of:
password_reset, hardware, software, network, other.
Request: {{ .Event.Transcript }}
max_tokens: 50
temperature: 0.0
transitions:
- event: hook_result
condition: "{{ .Result.Category == 'password_reset' }}"
target: password_reset
# ... more conditions
Important: LLM nodes add latency (typically 200ms–2s). Use them only where keyword matching or simple pattern matching is insufficient. The FSM structure ensures that LLM failures are handled gracefully through
hook_errortransitions.
State Store
For high-availability deployments, call state can be persisted externally:
runtime:
state_store: redis
redis:
address: redis:6379
password: "${REDIS_PASSWORD}"
db: 0
key_prefix: "vg:session:"
This enables:
- Call state survival across pod restarts
- Graceful failover between runtime instances
- Call state inspection via external tools
Metrics
| Metric | Type | Description |
|---|---|---|
vg_runtime_active_sessions | Gauge | Currently active call sessions |
vg_runtime_state_transitions | Counter | Total state transitions |
vg_runtime_dialog_errors | Counter | Dialog execution errors |
vg_runtime_hook_latency_seconds | Histogram | Time waiting for hook responses |
vg_runtime_turn_duration_seconds | Histogram | Time per conversation turn |
Next Steps
- Integration Gateway — connect dialog hooks to your backend
- Dialog Hooks API — implement the hook service interface
- Speech Gateway — tune ASR for your use case