Private Voice Infrastructure
for Regulated Systems
A self-hosted voice gateway that converts live phone calls into secure, real-time programmable sessions with on-prem speech recognition and dialog orchestration — enabling regulated teams to automate and integrate voice workflows without sending audio to the cloud.
- No cloud dependency.
- No vendor lock-in.
- Built for compliance-first environments.
$ curl -sSL https://get.voicetyped.com/install | sh
# Start with local speech recognition
$ voice-gateway start --asr-model whisper-medium --sip-port 5060
INFO Media Gateway listening on :5060 (SIP/UDP)
INFO Speech Gateway ready (whisper-medium, GPU: detected)
INFO REST API listening on :8080
INFO Metrics endpoint on :9100/metrics
✓ voicetyped is running
Four services. One platform.
voicetyped is composed of four purpose-built services that handle every layer of voice automation — from receiving phone calls to integrating with your backend systems.
Media Gateway
Handles inbound and outbound phone calls using standard telephony protocols (SIP/RTP). Manages the full call lifecycle — connect, hold, hang up — and outputs a clean audio stream per call.
- Telephony endpoint (SIP) with full call lifecycle
- Audio capture and format conversion (G.711, Opus)
- Normalized PCM stream output per call
- Built with Go + PJSIP
Speech Gateway
On-premise speech recognition powered by whisper.cpp. Transcribes calls in real time with partial and final results — no audio sent to any external service.
- On-premise speech-to-text — no cloud required
- Partial and final transcripts
- Batching and audio segmentation
- Per-call worker pool management
Conversation Runtime
The core differentiator. A rule-based dialog engine that follows predefined conversation flows. Every path is deterministic and auditable — with optional LLM nodes for states that need natural language understanding.
- Turn detection & dialog state
- Deterministic, rule-based conversation flows
- Tool invocation framework
- Optional LLM node integration
Integration Gateway
Connects to customer backend systems via REST and webhooks. Built-in authentication, retry logic, rate limiting, and circuit breaking for production reliability.
- REST & webhook backend integration
- Authentication & retry logic
- Rate limiting & circuit breaking
- Production-grade reliability
End-to-end voice pipeline
From phone call ingestion to backend integration. Every layer is self-hosted, auditable, and runs without internet connectivity.
From call to action in milliseconds
voicetyped processes inbound calls through a deterministic pipeline that your engineering team fully controls.
Terminate the Call
Inbound phone calls are received by the Media Gateway. The raw audio is extracted and converted to a normalized PCM stream, handling codec transcoding (G.711, Opus) automatically.
Transcribe Locally
The Speech Gateway runs whisper.cpp locally to produce real-time partial and final transcripts. No audio ever leaves your infrastructure. GPU acceleration is optional but recommended.
Process with Dialog FSM
The Conversation Runtime evaluates transcripts against your defined conversation rules. It handles turn detection, timeouts, keypad input (DTMF), and action triggers deterministically.
Execute Actions
Resolved actions trigger calls to your backend via the Integration Gateway. It handles authentication, retries, rate limiting, and circuit breaking — all battle-tested patterns.
Respond via TTS
Text-to-speech is rendered locally and streamed back to the caller through the Media Gateway. The caller hears a natural response while the FSM advances to the next state.
Observe Everything
Every call, transcript, and action is logged. Prometheus metrics and OpenTelemetry traces give you full visibility into call duration, ASR latency, queue depth, and more.
A programmable API surface
voicetyped exposes clean REST APIs that your engineering teams integrate with. Customers write small services that implement webhook hooks — the rest is handled for you.
Call Event Stream
Subscribe to real-time call events across your deployment. Receive speech events, keypad input, timeouts, and backend results streamed in real time (SSE).
GET /v1/calls/events
Accept: text/event-stream
event: speech_final
data: {"session_id":"call-abc-123",
"transcript":"I need help"}Call Control
Programmatically control active calls. Play TTS, transfer calls, or hang up — all through a clean REST interface.
POST /v1/calls/{id}/tts
POST /v1/calls/{id}/hangup
POST /v1/calls/{id}/transferDialog Hooks
Implement a webhook endpoint in your backend. voicetyped POSTs JSON to your URL when a dialog event matches.
// voicetyped sends:
POST https://your-server/on-intent
{"transcript":"password reset",
"session_id":"call-abc-123"}Speech API
Direct access to the local ASR engine. Stream audio via WebSocket or transcribe files via REST — useful for testing, QA, and custom integrations.
POST /v1/speech/transcribe
WebSocket /v1/speech/stream
GET /v1/speech/modelsDeploy anywhere. Run everywhere.
Single VM, Kubernetes cluster, or air-gapped facility. voicetyped adapts to your infrastructure constraints.
Single Node
One binary, systemd service, local models. Perfect for development and small deployments.
# systemd service
[Service]
ExecStart=/usr/bin/voice-gateway
Restart=always
Environment=VG_ASR_MODEL=whisper-medium
Environment=VG_SIP_PORT=5060Kubernetes
Helm chart with persistent volumes for models, optional GPU scheduling, and autoscaling on active call sessions.
# helm install
helm install voice-gateway \
voicetyped/voice-gateway \
--set asr.model=whisper-medium \
--set asr.gpu.enabled=true \
--set autoscaling.enabled=trueAir-Gapped
Offline installer bundle with preloaded models and zero external dependencies. Deploy in isolated networks, classified environments, and facilities with no internet access.
# Offline deployment
./voice-gateway-offline-installer.sh \
--models-dir /opt/vg/models \
--config /etc/voice-gateway/config.yaml \
--no-internetJoin the waitlist
voicetyped is not yet publicly available. Join the waitlist to get early access when we launch. We'll reach out based on your expected call volume.
Ready to own your voice infrastructure?
voicetyped is coming soon. Join the waitlist for early access, or read the docs to learn more about the architecture.