Media Gateway

SIP termination, RTP audio handling, and codec management for voicetyped.

The Media Gateway is the telephony boundary of voicetyped. It handles SIP signaling, RTP audio streams, and codec transcoding so that downstream services receive clean, normalized audio. If this component drops calls, the entire system fails — so telephony reliability is the top priority.

Responsibilities

  • SIP endpoint — Receives and processes SIP signaling (INVITE, BYE, CANCEL, re-INVITE)
  • RTP audio handling — Receives, buffers, and transmits RTP audio packets
  • Codec transcoding — Negotiates and converts between G.711 μ-law, G.711 A-law, and Opus
  • Call lifecycle — Manages the full call state: ringing, connected, on-hold, transferred, terminated
  • DTMF detection — Supports both RFC 2833 (out-of-band) and in-band DTMF
  • Jitter buffer — Smooths out network-induced audio packet timing variations
  • Packet loss handling — Implements PLC (Packet Loss Concealment) for degraded networks

Configuration

# /etc/voice-gateway/config.yaml — media section

media:
  # SIP Configuration
  sip_port: 5060               # UDP port for SIP signaling
  sip_transport: udp            # udp, tcp, or tls
  sip_tls_cert: ""              # Path to TLS cert (for SIP-TLS)
  sip_tls_key: ""               # Path to TLS key

  # RTP Configuration
  rtp_port_range: "10000-20000" # Port range for RTP media
  rtp_symmetric: true           # Use symmetric RTP (NAT traversal)

  # Codec Configuration
  codecs:
    - g711-ulaw                 # G.711 μ-law (default for North America)
    - g711-alaw                 # G.711 A-law (default for Europe)
    - opus                      # Opus (WebRTC, high quality)

  # Audio Output
  output_format: pcm16          # Output format to Speech Gateway
  output_sample_rate: 16000     # 16kHz mono PCM
  output_channels: 1            # Mono

  # Reliability
  jitter_buffer_ms: 60          # Jitter buffer size in milliseconds
  packet_loss_concealment: true # Enable PLC
  reconnection_timeout: 30s    # Time to wait before declaring call dead
  keepalive_interval: 30s      # SIP keepalive interval

SIP Signaling

Supported SIP Methods

MethodDirectionDescription
INVITEInboundNew call setup
ACKBothAcknowledge call setup
BYEBothTerminate call
CANCELInboundCancel pending call
re-INVITEBothModify active call (hold/resume, codec change)
OPTIONSBothKeepalive / capability query
REFEROutboundCall transfer

SIP Headers

voicetyped extracts and exposes these SIP headers to downstream services:

From: "John Doe" <sip:+15551234567@carrier.example.com>
To: <sip:greeting@voicegateway.local:5060>
Call-ID: a84b4c76e66710@192.168.1.100
X-Custom-Header: custom-value

Custom headers prefixed with X- are passed through to the CallSession and available in dialog flows.

SIP Registration

For deployments behind a SIP trunk or PBX, the Media Gateway can register as a SIP client:

media:
  registrations:
    - uri: sip:voicegateway@pbx.internal
      username: voicegateway
      password: "${SIP_PASSWORD}"  # Environment variable
      registrar: sip:pbx.internal:5060
      expiry: 3600

RTP Audio Handling

Audio Pipeline

RTP packets → Jitter Buffer → Codec Decode → Resampler → PCM 16kHz mono
  1. Jitter Buffer absorbs network timing variations (configurable, default 60ms)
  2. Codec Decode converts from the negotiated codec to raw PCM
  3. Resampler converts to 16kHz mono PCM (required by whisper.cpp)

Codec Support

CodecBandwidthQualityUse Case
G.711 μ-law64 kbpsGoodPSTN, North American carriers
G.711 A-law64 kbpsGoodPSTN, European carriers
Opus6-510 kbpsExcellentWebRTC, modern VoIP

Codec negotiation happens during SIP INVITE/200 OK via SDP (Session Description Protocol). The Media Gateway prefers codecs in the order listed in configuration.

DTMF Detection

Two DTMF methods are supported:

  1. RFC 2833 (recommended) — DTMF digits sent as named telephone events in RTP
  2. In-band — DTMF tones detected from the audio stream using Goertzel algorithm
media:
  dtmf:
    method: rfc2833          # rfc2833, inband, or both
    inband_sensitivity: 0.8  # Sensitivity for in-band detection (0.0–1.0)

Call Lifecycle

State Machine

         INVITE
           │
           ▼
        RINGING ──── CANCEL ──→ TERMINATED
           │
           ▼ (200 OK + ACK)
       CONNECTED
      /    │     \
  HOLD  TRANSFER  BYE
     │     │        │
     ▼     ▼        ▼
  RESUMED  ...   TERMINATED

Call Events

The Media Gateway emits these events to downstream services:

enum CallEventType {
  CALL_STARTED = 0;
  CALL_RINGING = 1;
  CALL_CONNECTED = 2;
  CALL_ON_HOLD = 3;
  CALL_RESUMED = 4;
  CALL_TRANSFERRED = 5;
  CALL_TERMINATED = 6;
  DTMF_RECEIVED = 7;
  AUDIO_STARTED = 8;
  AUDIO_STOPPED = 9;
}

Reliability

Jitter Buffer

The adaptive jitter buffer smooths network-induced timing variations:

media:
  jitter_buffer_ms: 60       # Initial buffer size
  jitter_buffer_max_ms: 200  # Maximum buffer size
  jitter_buffer_adaptive: true # Auto-adjust based on network conditions

Packet Loss Concealment

When packets are lost, PLC fills the gap using the previous audio frame:

  • Loss < 5%: PLC maintains acceptable quality
  • Loss 5–15%: Noticeable degradation, PLC helps significantly
  • Loss > 15%: Consider network remediation

Reconnection

If the RTP stream stops (network issue), the Media Gateway waits before terminating:

media:
  reconnection_timeout: 30s   # Wait this long before declaring the call dead
  rtp_timeout: 10s             # Time with no RTP before triggering reconnection

Metrics

The Media Gateway exposes these Prometheus metrics:

MetricTypeDescription
vg_media_active_callsGaugeCurrently active calls
vg_media_total_callsCounterTotal calls since startup
vg_media_call_duration_secondsHistogramCall duration distribution
vg_media_rtp_packets_receivedCounterRTP packets received
vg_media_rtp_packets_lostCounterRTP packets lost
vg_media_rtp_jitter_msHistogramRTP jitter distribution
vg_media_codec_negotiation_errorsCounterFailed codec negotiations
vg_media_dtmf_eventsCounterDTMF events detected

Troubleshooting

No audio after call connects

  1. Check that the RTP port range is open in your firewall
  2. Verify rtp_symmetric: true is set if behind NAT
  3. Check codec negotiation — ensure at least one common codec between caller and gateway

Calls drop after 30 seconds

This usually indicates a SIP NAT traversal issue. Ensure:

  1. SIP keepalive is enabled (keepalive_interval: 30s)
  2. The SIP proxy or SBC is configured to relay RTP

High jitter or packet loss

  1. Increase jitter buffer size (jitter_buffer_ms: 100)
  2. Enable adaptive jitter buffer
  3. Check network path for congestion
  4. Consider dedicated network paths for RTP traffic

Next Steps