Self-hosted · Private · Programmable

Private Voice Infrastructure
for Regulated Systems

A self-hosted voice gateway that converts live phone calls into secure, real-time programmable sessions with on-prem speech recognition and dialog orchestration — enabling regulated teams to automate and integrate voice workflows without sending audio to the cloud.

No cloud dependency.
No vendor lock-in.
Built for compliance-first environments.

Join Waitlist View Architecture

voice-gateway — deploy

# Install voicetyped
$ curl -sSL https://get.voicetyped.com/install | sh

# Start with local speech recognition
$ voice-gateway start --asr-model whisper-medium --sip-port 5060

INFO Media Gateway listening on :5060 (SIP/UDP)
INFO Speech Gateway ready (whisper-medium, GPU: detected)
INFO REST API listening on :8080
INFO Metrics endpoint on :9100/metrics
✓ voicetyped is running

SIP / WebRTC ConnectRPC whisper.cpp Go Kubernetes Prometheus OpenTelemetry mTLS

Core Capabilities

Four services. One platform.

voicetyped is composed of four purpose-built services that handle every layer of voice automation — from receiving phone calls to integrating with your backend systems.

Media Gateway

Handles inbound and outbound phone calls using standard telephony protocols (SIP/RTP). Manages the full call lifecycle — connect, hold, hang up — and outputs a clean audio stream per call.

Telephony endpoint (SIP) with full call lifecycle
Audio capture and format conversion (G.711, Opus)
Normalized PCM stream output per call
Built with Go + PJSIP

Speech Gateway

On-premise speech recognition powered by whisper.cpp. Transcribes calls in real time with partial and final results — no audio sent to any external service.

On-premise speech-to-text — no cloud required
Partial and final transcripts
Batching and audio segmentation
Per-call worker pool management

Conversation Runtime

The core differentiator. A rule-based dialog engine that follows predefined conversation flows. Every path is deterministic and auditable — with optional LLM nodes for states that need natural language understanding.

Turn detection & dialog state
Deterministic, rule-based conversation flows
Tool invocation framework
Optional LLM node integration

Integration Gateway

Connects to customer backend systems via REST and webhooks. Built-in authentication, retry logic, rate limiting, and circuit breaking for production reliability.

REST & webhook backend integration
Authentication & retry logic
Rate limiting & circuit breaking
Production-grade reliability

System Design

End-to-end voice pipeline

From phone call ingestion to backend integration. Every layer is self-hosted, auditable, and runs without internet connectivity.

SIP / WebRTC Endpoint

RTP audio

Media Gateway

PCM stream

Speech Gateway (Local ASR + TTS)

Transcripts

Conversation Runtime

Actions

Integration Gateway

REST / HTTP

Customer Backend

How It Works

From call to action in milliseconds

voicetyped processes inbound calls through a deterministic pipeline that your engineering team fully controls.

Terminate the Call

Inbound phone calls are received by the Media Gateway. The raw audio is extracted and converted to a normalized PCM stream, handling codec transcoding (G.711, Opus) automatically.

Transcribe Locally

The Speech Gateway runs whisper.cpp locally to produce real-time partial and final transcripts. No audio ever leaves your infrastructure. GPU acceleration is optional but recommended.

Process with Dialog FSM

The Conversation Runtime evaluates transcripts against your defined conversation rules. It handles turn detection, timeouts, keypad input (DTMF), and action triggers deterministically.

Execute Actions

Resolved actions trigger calls to your backend via the Integration Gateway. It handles authentication, retries, rate limiting, and circuit breaking — all battle-tested patterns.

Respond via TTS

Text-to-speech is rendered locally and streamed back to the caller through the Media Gateway. The caller hears a natural response while the FSM advances to the next state.

Observe Everything

Every call, transcript, and action is logged. Prometheus metrics and OpenTelemetry traces give you full visibility into call duration, ASR latency, queue depth, and more.

Developer Experience

A programmable API surface

voicetyped exposes clean REST APIs that your engineering teams integrate with. Customers write small services that implement webhook hooks — the rest is handled for you.

Call Event Stream

Subscribe to real-time call events across your deployment. Receive speech events, keypad input, timeouts, and backend results streamed in real time (SSE).

GET /v1/calls/events
Accept: text/event-stream

event: speech_final
data: {"session_id":"call-abc-123",
  "transcript":"I need help"}

Call Control

Programmatically control active calls. Play TTS, transfer calls, or hang up — all through a clean REST interface.

POST /v1/calls/{id}/tts
POST /v1/calls/{id}/hangup
POST /v1/calls/{id}/transfer

Dialog Hooks

Implement a webhook endpoint in your backend. voicetyped POSTs JSON to your URL when a dialog event matches.

// voicetyped sends:
POST https://your-server/on-intent
{"transcript":"password reset",
 "session_id":"call-abc-123"}

Speech API

Direct access to the local ASR engine. Stream audio via WebSocket or transcribe files via REST — useful for testing, QA, and custom integrations.

POST /v1/speech/transcribe
WebSocket /v1/speech/stream
GET  /v1/speech/models

Deployment

Deploy anywhere. Run everywhere.

Single VM, Kubernetes cluster, or air-gapped facility. voicetyped adapts to your infrastructure constraints.

Single Node

One binary, systemd service, local models. Perfect for development and small deployments.

# systemd service
[Service]
ExecStart=/usr/bin/voice-gateway
Restart=always
Environment=VG_ASR_MODEL=whisper-medium
Environment=VG_SIP_PORT=5060

Kubernetes

Helm chart with persistent volumes for models, optional GPU scheduling, and autoscaling on active call sessions.

# helm install
helm install voice-gateway \
  voicetyped/voice-gateway \
  --set asr.model=whisper-medium \
  --set asr.gpu.enabled=true \
  --set autoscaling.enabled=true

Air-Gapped

Offline installer bundle with preloaded models and zero external dependencies. Deploy in isolated networks, classified environments, and facilities with no internet access.

# Offline deployment
./voice-gateway-offline-installer.sh \
  --models-dir /opt/vg/models \
  --config /etc/voice-gateway/config.yaml \
  --no-internet

Early Access

Join the waitlist

voicetyped is not yet publicly available. Join the waitlist to get early access when we launch. We'll reach out based on your expected call volume.

Ready to own your voice infrastructure?

voicetyped is coming soon. Join the waitlist for early access, or read the docs to learn more about the architecture.

Join Waitlist Read the Docs

Private Voice Infrastructurefor Regulated Systems

Four services. One platform.

Media Gateway

Speech Gateway

Conversation Runtime

Integration Gateway

End-to-end voice pipeline

From call to action in milliseconds

Terminate the Call

Transcribe Locally

Process with Dialog FSM

Execute Actions

Respond via TTS

Observe Everything

A programmable API surface

Call Event Stream

Call Control

Dialog Hooks

Speech API

Deploy anywhere. Run everywhere.

Single Node

Kubernetes

Air-Gapped

Join the waitlist

Ready to own your voice infrastructure?

Private Voice Infrastructure
for Regulated Systems