Getting Started
Install and run voicetyped in under 10 minutes.
This guide walks you through installing voicetyped, running your first call flow, and verifying that the system is operational. By the end, you will have a working SIP endpoint that transcribes inbound calls and executes a simple dialog.
Prerequisites
Before you begin, ensure you have:
- Linux host (Ubuntu 22.04+ or RHEL 8+ recommended)
- 4 GB RAM minimum (8 GB recommended for GPU-accelerated ASR)
- Go 1.21+ (if building from source)
- A SIP client for testing (e.g., Opal, Opal, Linphone, or a softphone)
- Optional: NVIDIA GPU with CUDA 12+ for accelerated speech recognition
Installation
Option 1: Quick Install (recommended)
Download and run the installer script:
curl -sSL https://get.voicetyped.com/install | sh
This installs the voice-gateway binary to /usr/local/bin/ and downloads the default ASR model (whisper-base).
Option 2: Build from Source
git clone https://github.com/voicetyped/voice-gateway.git
cd voice-gateway
make build
sudo make install
Option 3: Docker
docker pull voicetyped/voice-gateway:latest
docker run -d \
--name voice-gateway \
-p 5060:5060/udp \
-p 8080:8080 \
-p 9100:9100 \
-v /opt/vg/models:/models \
voicetyped/voice-gateway:latest
Download ASR Models
voicetyped uses whisper.cpp for local speech recognition. Download the model you need:
# Base model (fastest, least accurate)
voice-gateway model download whisper-base
# Medium model (recommended for production)
voice-gateway model download whisper-medium
# Large model (most accurate, requires GPU)
voice-gateway model download whisper-large-v3
Models are stored in /var/lib/voice-gateway/models/ by default.
| Model | Size | Speed | Accuracy | GPU Required |
|---|---|---|---|---|
| whisper-base | 142 MB | Real-time | Good | No |
| whisper-small | 466 MB | Real-time | Better | No |
| whisper-medium | 1.5 GB | Near real-time | High | Recommended |
| whisper-large-v3 | 3.1 GB | Slower | Highest | Yes |
Configuration
Create a configuration file at /etc/voice-gateway/config.yaml:
# /etc/voice-gateway/config.yaml
media:
sip_port: 5060
rtp_port_range: "10000-20000"
codecs:
- g711-ulaw
- g711-alaw
- opus
speech:
engine: whisper
model: whisper-medium
language: en
gpu: auto # auto, true, false
runtime:
dialog_dir: /etc/voice-gateway/dialogs/
default_timeout: 10s
max_concurrent_calls: 10
integration:
api_port: 8080
observability:
metrics_port: 9100
log_level: info
otlp_endpoint: "" # Optional OpenTelemetry collector
security:
mtls: false # Enable for production
cert_dir: /etc/voice-gateway/certs/
Start voicetyped
# Start with default configuration
voice-gateway start
# Start with a specific config file
voice-gateway start --config /etc/voice-gateway/config.yaml
# Start with inline overrides
voice-gateway start --asr-model whisper-medium --sip-port 5060
You should see output similar to:
INFO Loading configuration from /etc/voice-gateway/config.yaml
INFO Media Gateway listening on :5060 (SIP/UDP)
INFO Speech Gateway ready (whisper-medium, GPU: detected)
INFO Conversation Runtime loaded 1 dialog(s)
INFO REST API listening on :8080
INFO Metrics endpoint on :9100/metrics
✓ voicetyped is running
Create Your First Dialog
Create a simple dialog flow at /etc/voice-gateway/dialogs/greeting.yaml:
# /etc/voice-gateway/dialogs/greeting.yaml
name: greeting
description: Simple greeting dialog
states:
start:
on_enter:
- action: play_tts
text: "Hello, you have reached the IT helpdesk. How can I help you?"
transitions:
- event: speech
target: process_request
- event: timeout
after: 10s
target: no_input
process_request:
on_enter:
- action: call_hook
service: dialog_hooks
method: OnIntent
transitions:
- event: hook_result
target: respond
- event: timeout
after: 15s
target: no_input
respond:
on_enter:
- action: play_tts
text: "{{ .HookResult.Response }}"
transitions:
- event: speech
target: process_request
- event: timeout
after: 10s
target: goodbye
no_input:
on_enter:
- action: play_tts
text: "I did not hear anything. Please try again."
transitions:
- event: speech
target: process_request
- event: timeout
after: 10s
target: goodbye
goodbye:
on_enter:
- action: play_tts
text: "Thank you for calling. Goodbye."
- action: hangup
Test Your Setup
1. Check system status
voice-gateway status
Expected output:
voicetyped Status
Media Gateway: ✓ running (SIP :5060)
Speech Gateway: ✓ running (whisper-medium)
Runtime: ✓ running (1 dialog loaded)
Integration: ✓ running (REST :8080)
Active Calls: 0
Uptime: 2m 34s
2. Make a test call
Using a SIP softphone, dial sip:greeting@<your-server-ip>:5060. You should hear the greeting prompt from your dialog.
3. Check metrics
curl http://localhost:9100/metrics | grep voice_gateway
You will see Prometheus metrics including:
voice_gateway_active_calls 0
voice_gateway_total_calls 1
voice_gateway_asr_latency_seconds{quantile="0.99"} 0.234
voice_gateway_call_duration_seconds_sum 45.2
Next Steps
- Architecture Overview — understand how the components fit together
- Media Gateway — configure SIP and RTP handling
- Speech Gateway — tune ASR performance
- API Reference — integrate with your backend
- Kubernetes Deployment — scale to production