Roadmap
voicetyped development roadmap — from MVP to enterprise platform.
voicetyped follows a phased development roadmap. Each phase builds on the previous one, adding capabilities while maintaining the core principles: self-hosted, private, and programmable.
Current Status
v0.1 — MVP (in progress)
The initial release focuses on proving the core value proposition: a self-hosted service that terminates SIP calls and exposes them as programmable sessions with local ASR.
v0.1 Scope
- SIP inbound call termination
- PCM audio extraction from RTP
- Streaming ASR (whisper.cpp)
- Voice activity detection
- Turn detection
- One dialog FSM (YAML-defined)
- TTS playback (Piper)
- REST API & webhook integration
- Simple CLI installer
- Prometheus metrics endpoint
- DTMF detection
- Call transfer (REFER)
- Air-gapped installer
Phase 1 — Infrastructure
Focus: Solid telephony foundation and speech pipeline.
| Feature | Status | Description |
|---|---|---|
| Media Gateway | Complete | SIP endpoint, RTP handling, codec support |
| Speech Gateway | Complete | whisper.cpp ASR, Piper TTS |
| Streaming APIs | Complete | SSE streaming for events, WebSocket for audio |
| Call Control API | Complete | PlayTTS, Hangup, Transfer |
| DTMF Support | In Progress | RFC 2833 and in-band detection |
| Opus Codec | In Progress | WebRTC-compatible audio |
| SIP Registration | Planned | Register with external PBX |
| WebRTC Gateway | Planned | Browser-based voice sessions |
Phase 2 — Dialog Runtime
Focus: Deterministic conversation engine and integration framework.
| Feature | Status | Description |
|---|---|---|
| FSM Engine | Complete | YAML-defined state machines |
| Per-call State | Complete | Isolated session state |
| Timeout Handling | Complete | Configurable per-state timeouts |
| Barge-in | Complete | Caller can interrupt TTS |
| Dialog Hooks | Complete | Webhook hooks to customer backends |
| Multi-digit DTMF | Planned | Collect account numbers, PINs |
| Dialog Variables | Complete | Template variables in dialogs |
| Retry & Fallback | In Progress | Graceful error handling |
| Dialog Routing | Planned | Route calls to dialogs by SIP headers |
| Conditional Transitions | Complete | Expression-based routing |
| Sub-dialogs | Planned | Reusable dialog fragments |
Phase 3 — Enterprise Features
Focus: Multi-tenancy, access control, and production hardening.
| Feature | Status | Description |
|---|---|---|
| Multi-tenant Isolation | Planned | Separate tenants with resource quotas |
| RBAC | Planned | Role-based access to APIs |
| Quotas | Planned | Per-tenant call and resource limits |
| HA Deployment | Planned | Active-active with Redis state store |
| Audit Logging | In Progress | Comprehensive audit trail |
| mTLS | In Progress | Mutual TLS between all components |
| Encryption at Rest | Planned | AES-256-GCM for stored data |
| Certificate Rotation | Planned | Automated cert management |
| Admin API | Planned | Configuration management via API |
| Helm Chart | In Progress | Kubernetes deployment |
Phase 4 — Intelligence Layer
Focus: Optional AI capabilities layered on top of the deterministic runtime.
| Feature | Status | Description |
|---|---|---|
| Intent Classifiers | Planned | Local intent classification models |
| LLM Nodes | Planned | Optional LLM-powered dialog states |
| RAG Adapters | Planned | Retrieval-augmented generation for knowledge |
| Sentiment Analysis | Planned | Real-time caller sentiment scoring |
| Language Detection | Planned | Automatic language detection and routing |
| Keyword Spotting | Planned | Real-time keyword alerts |
| Multilingual ASR | Planned | Multi-language support in single calls |
Future — Expansion Opportunities
These features are being evaluated for future phases:
- Voice Analytics — call quality scoring, agent performance metrics
- Compliance Redaction — automatic PII detection and redaction in transcripts
- Keyword Alerts — real-time alerting on specific phrases
- Multilingual Support — per-call language switching
- Offline Translation — local translation between languages
- Recording & Playback — secure call recording with consent management
- Call Queuing — built-in queue management for high-volume scenarios
- Outbound Dialing — programmatic outbound call initiation
What We Will NOT Build
These are explicitly out of scope — they are customer-side concerns:
- CRM integration (customers build this via hooks)
- Ticketing system (customers use their existing tools)
- Agent desktop (customers build or buy this separately)
- Visual dialog designers (dialogs are YAML — code, not clicks)
- Analytics dashboards (customers use Grafana or build their own)
- Consumer-facing applications (we are infrastructure)
Release Cadence
- Patch releases (v0.1.x): Bug fixes, security patches — as needed
- Minor releases (v0.x.0): New features, backward compatible — monthly
- Major releases (vX.0.0): Breaking changes — annually (with migration guides)
Contributing
voicetyped uses an open-core model:
Open Source (Apache 2.0):
- Speech Gateway
- Basic Media Gateway
- CLI tools
- Client libraries
Commercial License:
- Conversation Runtime
- Integration Gateway
- Admin tooling
- Enterprise security features
- Multi-tenant support
Contributions to open-source components are welcome. See the GitHub repository for contribution guidelines.
Feedback
We prioritize features based on customer and community feedback. To request a feature or report a bug:
- GitHub Issues — for bug reports and feature requests
- Discussions — for architecture questions and use-case discussions
- Email — enterprise@voicetyped.com for commercial inquiries