Roadmap

voicetyped development roadmap — from MVP to enterprise platform.

voicetyped follows a phased development roadmap. Each phase builds on the previous one, adding capabilities while maintaining the core principles: self-hosted, private, and programmable.

Current Status

v0.1 — MVP (in progress)

The initial release focuses on proving the core value proposition: a self-hosted service that terminates SIP calls and exposes them as programmable sessions with local ASR.

v0.1 Scope

  • SIP inbound call termination
  • PCM audio extraction from RTP
  • Streaming ASR (whisper.cpp)
  • Voice activity detection
  • Turn detection
  • One dialog FSM (YAML-defined)
  • TTS playback (Piper)
  • REST API & webhook integration
  • Simple CLI installer
  • Prometheus metrics endpoint
  • DTMF detection
  • Call transfer (REFER)
  • Air-gapped installer

Phase 1 — Infrastructure

Focus: Solid telephony foundation and speech pipeline.

FeatureStatusDescription
Media GatewayCompleteSIP endpoint, RTP handling, codec support
Speech GatewayCompletewhisper.cpp ASR, Piper TTS
Streaming APIsCompleteSSE streaming for events, WebSocket for audio
Call Control APICompletePlayTTS, Hangup, Transfer
DTMF SupportIn ProgressRFC 2833 and in-band detection
Opus CodecIn ProgressWebRTC-compatible audio
SIP RegistrationPlannedRegister with external PBX
WebRTC GatewayPlannedBrowser-based voice sessions

Phase 2 — Dialog Runtime

Focus: Deterministic conversation engine and integration framework.

FeatureStatusDescription
FSM EngineCompleteYAML-defined state machines
Per-call StateCompleteIsolated session state
Timeout HandlingCompleteConfigurable per-state timeouts
Barge-inCompleteCaller can interrupt TTS
Dialog HooksCompleteWebhook hooks to customer backends
Multi-digit DTMFPlannedCollect account numbers, PINs
Dialog VariablesCompleteTemplate variables in dialogs
Retry & FallbackIn ProgressGraceful error handling
Dialog RoutingPlannedRoute calls to dialogs by SIP headers
Conditional TransitionsCompleteExpression-based routing
Sub-dialogsPlannedReusable dialog fragments

Phase 3 — Enterprise Features

Focus: Multi-tenancy, access control, and production hardening.

FeatureStatusDescription
Multi-tenant IsolationPlannedSeparate tenants with resource quotas
RBACPlannedRole-based access to APIs
QuotasPlannedPer-tenant call and resource limits
HA DeploymentPlannedActive-active with Redis state store
Audit LoggingIn ProgressComprehensive audit trail
mTLSIn ProgressMutual TLS between all components
Encryption at RestPlannedAES-256-GCM for stored data
Certificate RotationPlannedAutomated cert management
Admin APIPlannedConfiguration management via API
Helm ChartIn ProgressKubernetes deployment

Phase 4 — Intelligence Layer

Focus: Optional AI capabilities layered on top of the deterministic runtime.

FeatureStatusDescription
Intent ClassifiersPlannedLocal intent classification models
LLM NodesPlannedOptional LLM-powered dialog states
RAG AdaptersPlannedRetrieval-augmented generation for knowledge
Sentiment AnalysisPlannedReal-time caller sentiment scoring
Language DetectionPlannedAutomatic language detection and routing
Keyword SpottingPlannedReal-time keyword alerts
Multilingual ASRPlannedMulti-language support in single calls

Future — Expansion Opportunities

These features are being evaluated for future phases:

  • Voice Analytics — call quality scoring, agent performance metrics
  • Compliance Redaction — automatic PII detection and redaction in transcripts
  • Keyword Alerts — real-time alerting on specific phrases
  • Multilingual Support — per-call language switching
  • Offline Translation — local translation between languages
  • Recording & Playback — secure call recording with consent management
  • Call Queuing — built-in queue management for high-volume scenarios
  • Outbound Dialing — programmatic outbound call initiation

What We Will NOT Build

These are explicitly out of scope — they are customer-side concerns:

  • CRM integration (customers build this via hooks)
  • Ticketing system (customers use their existing tools)
  • Agent desktop (customers build or buy this separately)
  • Visual dialog designers (dialogs are YAML — code, not clicks)
  • Analytics dashboards (customers use Grafana or build their own)
  • Consumer-facing applications (we are infrastructure)

Release Cadence

  • Patch releases (v0.1.x): Bug fixes, security patches — as needed
  • Minor releases (v0.x.0): New features, backward compatible — monthly
  • Major releases (vX.0.0): Breaking changes — annually (with migration guides)

Contributing

voicetyped uses an open-core model:

Open Source (Apache 2.0):

  • Speech Gateway
  • Basic Media Gateway
  • CLI tools
  • Client libraries

Commercial License:

  • Conversation Runtime
  • Integration Gateway
  • Admin tooling
  • Enterprise security features
  • Multi-tenant support

Contributions to open-source components are welcome. See the GitHub repository for contribution guidelines.

Feedback

We prioritize features based on customer and community feedback. To request a feature or report a bug:

  • GitHub Issues — for bug reports and feature requests
  • Discussions — for architecture questions and use-case discussions
  • Emailenterprise@voicetyped.com for commercial inquiries