Self-hosted · Private · Programmable

Private Voice Infrastructure
for Regulated Systems

A self-hosted voice gateway that converts live phone calls into secure, real-time programmable sessions with on-prem speech recognition and dialog orchestration — enabling regulated teams to automate and integrate voice workflows without sending audio to the cloud.

  • No cloud dependency.
  • No vendor lock-in.
  • Built for compliance-first environments.
voice-gateway — deploy
# Install voicetyped
$ curl -sSL https://get.voicetyped.com/install | sh

# Start with local speech recognition
$ voice-gateway start --asr-model whisper-medium --sip-port 5060

INFO Media Gateway listening on :5060 (SIP/UDP)
INFO Speech Gateway ready (whisper-medium, GPU: detected)
INFO REST API listening on :8080
INFO Metrics endpoint on :9100/metrics
✓ voicetyped is running
SIP / WebRTC ConnectRPC whisper.cpp Go Kubernetes Prometheus OpenTelemetry mTLS

Four services. One platform.

voicetyped is composed of four purpose-built services that handle every layer of voice automation — from receiving phone calls to integrating with your backend systems.

Media Gateway

Handles inbound and outbound phone calls using standard telephony protocols (SIP/RTP). Manages the full call lifecycle — connect, hold, hang up — and outputs a clean audio stream per call.

  • Telephony endpoint (SIP) with full call lifecycle
  • Audio capture and format conversion (G.711, Opus)
  • Normalized PCM stream output per call
  • Built with Go + PJSIP

Speech Gateway

On-premise speech recognition powered by whisper.cpp. Transcribes calls in real time with partial and final results — no audio sent to any external service.

  • On-premise speech-to-text — no cloud required
  • Partial and final transcripts
  • Batching and audio segmentation
  • Per-call worker pool management

Conversation Runtime

The core differentiator. A rule-based dialog engine that follows predefined conversation flows. Every path is deterministic and auditable — with optional LLM nodes for states that need natural language understanding.

  • Turn detection & dialog state
  • Deterministic, rule-based conversation flows
  • Tool invocation framework
  • Optional LLM node integration

Integration Gateway

Connects to customer backend systems via REST and webhooks. Built-in authentication, retry logic, rate limiting, and circuit breaking for production reliability.

  • REST & webhook backend integration
  • Authentication & retry logic
  • Rate limiting & circuit breaking
  • Production-grade reliability

End-to-end voice pipeline

From phone call ingestion to backend integration. Every layer is self-hosted, auditable, and runs without internet connectivity.

SIP / WebRTC Endpoint
RTP audio
Media Gateway
PCM stream
Speech Gateway (Local ASR + TTS)
Transcripts
Conversation Runtime
Actions
Integration Gateway
REST / HTTP
Customer Backend

From call to action in milliseconds

voicetyped processes inbound calls through a deterministic pipeline that your engineering team fully controls.

1

Terminate the Call

Inbound phone calls are received by the Media Gateway. The raw audio is extracted and converted to a normalized PCM stream, handling codec transcoding (G.711, Opus) automatically.

2

Transcribe Locally

The Speech Gateway runs whisper.cpp locally to produce real-time partial and final transcripts. No audio ever leaves your infrastructure. GPU acceleration is optional but recommended.

3

Process with Dialog FSM

The Conversation Runtime evaluates transcripts against your defined conversation rules. It handles turn detection, timeouts, keypad input (DTMF), and action triggers deterministically.

4

Execute Actions

Resolved actions trigger calls to your backend via the Integration Gateway. It handles authentication, retries, rate limiting, and circuit breaking — all battle-tested patterns.

5

Respond via TTS

Text-to-speech is rendered locally and streamed back to the caller through the Media Gateway. The caller hears a natural response while the FSM advances to the next state.

6

Observe Everything

Every call, transcript, and action is logged. Prometheus metrics and OpenTelemetry traces give you full visibility into call duration, ASR latency, queue depth, and more.

A programmable API surface

voicetyped exposes clean REST APIs that your engineering teams integrate with. Customers write small services that implement webhook hooks — the rest is handled for you.

Call Event Stream

Subscribe to real-time call events across your deployment. Receive speech events, keypad input, timeouts, and backend results streamed in real time (SSE).

GET /v1/calls/events
Accept: text/event-stream

event: speech_final
data: {"session_id":"call-abc-123",
  "transcript":"I need help"}

Call Control

Programmatically control active calls. Play TTS, transfer calls, or hang up — all through a clean REST interface.

POST /v1/calls/{id}/tts
POST /v1/calls/{id}/hangup
POST /v1/calls/{id}/transfer

Dialog Hooks

Implement a webhook endpoint in your backend. voicetyped POSTs JSON to your URL when a dialog event matches.

// voicetyped sends:
POST https://your-server/on-intent
{"transcript":"password reset",
 "session_id":"call-abc-123"}

Speech API

Direct access to the local ASR engine. Stream audio via WebSocket or transcribe files via REST — useful for testing, QA, and custom integrations.

POST /v1/speech/transcribe
WebSocket /v1/speech/stream
GET  /v1/speech/models

Deploy anywhere. Run everywhere.

Single VM, Kubernetes cluster, or air-gapped facility. voicetyped adapts to your infrastructure constraints.

Single Node

One binary, systemd service, local models. Perfect for development and small deployments.

# systemd service
[Service]
ExecStart=/usr/bin/voice-gateway
Restart=always
Environment=VG_ASR_MODEL=whisper-medium
Environment=VG_SIP_PORT=5060

Kubernetes

Helm chart with persistent volumes for models, optional GPU scheduling, and autoscaling on active call sessions.

# helm install
helm install voice-gateway \
  voicetyped/voice-gateway \
  --set asr.model=whisper-medium \
  --set asr.gpu.enabled=true \
  --set autoscaling.enabled=true

Air-Gapped

Offline installer bundle with preloaded models and zero external dependencies. Deploy in isolated networks, classified environments, and facilities with no internet access.

# Offline deployment
./voice-gateway-offline-installer.sh \
  --models-dir /opt/vg/models \
  --config /etc/voice-gateway/config.yaml \
  --no-internet

Join the waitlist

voicetyped is not yet publicly available. Join the waitlist to get early access when we launch. We'll reach out based on your expected call volume.

Ready to own your voice infrastructure?

voicetyped is coming soon. Join the waitlist for early access, or read the docs to learn more about the architecture.