Getting Started

Install and run voicetyped in under 10 minutes.

This guide walks you through installing voicetyped, running your first call flow, and verifying that the system is operational. By the end, you will have a working SIP endpoint that transcribes inbound calls and executes a simple dialog.

Prerequisites

Before you begin, ensure you have:

  • Linux host (Ubuntu 22.04+ or RHEL 8+ recommended)
  • 4 GB RAM minimum (8 GB recommended for GPU-accelerated ASR)
  • Go 1.21+ (if building from source)
  • A SIP client for testing (e.g., Opal, Opal, Linphone, or a softphone)
  • Optional: NVIDIA GPU with CUDA 12+ for accelerated speech recognition

Installation

Download and run the installer script:

curl -sSL https://get.voicetyped.com/install | sh

This installs the voice-gateway binary to /usr/local/bin/ and downloads the default ASR model (whisper-base).

Option 2: Build from Source

git clone https://github.com/voicetyped/voice-gateway.git
cd voice-gateway
make build
sudo make install

Option 3: Docker

docker pull voicetyped/voice-gateway:latest
docker run -d \
  --name voice-gateway \
  -p 5060:5060/udp \
  -p 8080:8080 \
  -p 9100:9100 \
  -v /opt/vg/models:/models \
  voicetyped/voice-gateway:latest

Download ASR Models

voicetyped uses whisper.cpp for local speech recognition. Download the model you need:

# Base model (fastest, least accurate)
voice-gateway model download whisper-base

# Medium model (recommended for production)
voice-gateway model download whisper-medium

# Large model (most accurate, requires GPU)
voice-gateway model download whisper-large-v3

Models are stored in /var/lib/voice-gateway/models/ by default.

ModelSizeSpeedAccuracyGPU Required
whisper-base142 MBReal-timeGoodNo
whisper-small466 MBReal-timeBetterNo
whisper-medium1.5 GBNear real-timeHighRecommended
whisper-large-v33.1 GBSlowerHighestYes

Configuration

Create a configuration file at /etc/voice-gateway/config.yaml:

# /etc/voice-gateway/config.yaml

media:
  sip_port: 5060
  rtp_port_range: "10000-20000"
  codecs:
    - g711-ulaw
    - g711-alaw
    - opus

speech:
  engine: whisper
  model: whisper-medium
  language: en
  gpu: auto  # auto, true, false

runtime:
  dialog_dir: /etc/voice-gateway/dialogs/
  default_timeout: 10s
  max_concurrent_calls: 10

integration:
  api_port: 8080

observability:
  metrics_port: 9100
  log_level: info
  otlp_endpoint: ""  # Optional OpenTelemetry collector

security:
  mtls: false  # Enable for production
  cert_dir: /etc/voice-gateway/certs/

Start voicetyped

# Start with default configuration
voice-gateway start

# Start with a specific config file
voice-gateway start --config /etc/voice-gateway/config.yaml

# Start with inline overrides
voice-gateway start --asr-model whisper-medium --sip-port 5060

You should see output similar to:

INFO  Loading configuration from /etc/voice-gateway/config.yaml
INFO  Media Gateway listening on :5060 (SIP/UDP)
INFO  Speech Gateway ready (whisper-medium, GPU: detected)
INFO  Conversation Runtime loaded 1 dialog(s)
INFO  REST API listening on :8080
INFO  Metrics endpoint on :9100/metrics
✓ voicetyped is running

Create Your First Dialog

Create a simple dialog flow at /etc/voice-gateway/dialogs/greeting.yaml:

# /etc/voice-gateway/dialogs/greeting.yaml
name: greeting
description: Simple greeting dialog

states:
  start:
    on_enter:
      - action: play_tts
        text: "Hello, you have reached the IT helpdesk. How can I help you?"
    transitions:
      - event: speech
        target: process_request
      - event: timeout
        after: 10s
        target: no_input

  process_request:
    on_enter:
      - action: call_hook
        service: dialog_hooks
        method: OnIntent
    transitions:
      - event: hook_result
        target: respond
      - event: timeout
        after: 15s
        target: no_input

  respond:
    on_enter:
      - action: play_tts
        text: "{{ .HookResult.Response }}"
    transitions:
      - event: speech
        target: process_request
      - event: timeout
        after: 10s
        target: goodbye

  no_input:
    on_enter:
      - action: play_tts
        text: "I did not hear anything. Please try again."
    transitions:
      - event: speech
        target: process_request
      - event: timeout
        after: 10s
        target: goodbye

  goodbye:
    on_enter:
      - action: play_tts
        text: "Thank you for calling. Goodbye."
      - action: hangup

Test Your Setup

1. Check system status

voice-gateway status

Expected output:

voicetyped Status
  Media Gateway:    ✓ running (SIP :5060)
  Speech Gateway:   ✓ running (whisper-medium)
  Runtime:          ✓ running (1 dialog loaded)
  Integration:      ✓ running (REST :8080)
  Active Calls:     0
  Uptime:           2m 34s

2. Make a test call

Using a SIP softphone, dial sip:greeting@<your-server-ip>:5060. You should hear the greeting prompt from your dialog.

3. Check metrics

curl http://localhost:9100/metrics | grep voice_gateway

You will see Prometheus metrics including:

voice_gateway_active_calls 0
voice_gateway_total_calls 1
voice_gateway_asr_latency_seconds{quantile="0.99"} 0.234
voice_gateway_call_duration_seconds_sum 45.2

Next Steps