Kubernetes Deployment
Deploy voicetyped on Kubernetes with Helm, GPU scheduling, and autoscaling.
The Kubernetes deployment model is designed for production workloads requiring high availability, autoscaling, and GPU-accelerated speech processing. voicetyped ships as a Helm chart with sensible defaults and extensive customization options.
Prerequisites
- Kubernetes 1.26+
- Helm 3.12+
- PersistentVolume provisioner (for model storage)
- Optional: NVIDIA GPU Operator (for GPU-accelerated ASR)
- Optional: cert-manager (for automatic TLS certificate management)
Quick Start
# Add the voicetyped Helm repository
helm repo add voicetyped https://charts.voicetyped.com
helm repo update
# Install with defaults
helm install voice-gateway voicetyped/voice-gateway \
--namespace voice-gateway \
--create-namespace
# Install with custom values
helm install voice-gateway voicetyped/voice-gateway \
--namespace voice-gateway \
--create-namespace \
-f values.yaml
Helm Values
Complete values.yaml
# values.yaml
# Global settings
global:
image:
repository: voicetyped/voice-gateway
tag: "latest"
pullPolicy: IfNotPresent
# ──────────────────────────────────
# Media Gateway
# ──────────────────────────────────
mediaGateway:
replicas: 2
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "1Gi"
service:
type: LoadBalancer # or NodePort
sipPort: 5060
annotations: {}
config:
sipTransport: udp
rtpPortRange: "10000-20000"
codecs:
- g711-ulaw
- g711-alaw
- opus
jitterBufferMs: 60
# Host networking required for RTP
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
# ──────────────────────────────────
# Speech Gateway
# ──────────────────────────────────
speechGateway:
replicas: 1
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
nvidia.com/gpu: 1 # Request 1 GPU
config:
engine: whisper
model: whisper-medium
language: en
maxWorkers: 4
vad:
threshold: 0.5
minSilenceMs: 500
# Model storage
modelStorage:
enabled: true
storageClass: "standard" # Your PV storage class
size: 10Gi
mountPath: /models
# GPU scheduling
gpu:
enabled: true
type: nvidia # nvidia or amd
count: 1
nodeSelector:
nvidia.com/gpu.present: "true"
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 5
targetGPUUtilization: 80
# ──────────────────────────────────
# Conversation Runtime
# ──────────────────────────────────
runtime:
replicas: 2
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
config:
maxConcurrentCalls: 100
defaultTimeout: 10s
bargeIn: true
stateStore: redis # Use Redis for HA
# Load dialog definitions from ConfigMap
dialogConfigMap: voice-gateway-dialogs
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
targetCPUUtilization: 60
# ──────────────────────────────────
# Integration Gateway
# ──────────────────────────────────
integration:
replicas: 2
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
service:
apiPort: 8080
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 60
# ──────────────────────────────────
# Redis (for state store)
# ──────────────────────────────────
redis:
enabled: true
architecture: replication
auth:
enabled: true
existingSecret: voice-gateway-redis
replica:
replicaCount: 3
persistence:
enabled: true
size: 1Gi
# ──────────────────────────────────
# Observability
# ──────────────────────────────────
observability:
metrics:
enabled: true
port: 9100
serviceMonitor:
enabled: true # Create Prometheus ServiceMonitor
interval: 15s
tracing:
enabled: false
otlpEndpoint: ""
# ──────────────────────────────────
# Security
# ──────────────────────────────────
security:
mtls:
enabled: true
certManager:
enabled: true # Use cert-manager for certificates
issuerRef:
name: voice-gateway-ca
kind: ClusterIssuer
networkPolicy:
enabled: true
# ──────────────────────────────────
# Ingress (for HTTP API)
# ──────────────────────────────────
ingress:
enabled: false
className: nginx
hosts:
- host: vg-api.internal
paths:
- path: /
pathType: Prefix
tls: []
Architecture on Kubernetes
┌─────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ LoadBalancer │ │ ServiceMonitor │ │
│ │ :5060 (SIP) │ │ (Prometheus) │ │
│ └──────┬───────┘ └──────────────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Media Gateway │ (hostNetwork, 2-10 pods) │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────────┐ │
│ │ Speech Gateway │ (GPU nodes, 1-5 pods)│
│ │ + PV (models) │ │
│ └──────┬───────────┘ │
│ │ │
│ ┌──────▼───────┐ ┌───────────┐ │
│ │ Runtime │──│ Redis │ │
│ │ (2-20 pods) │ │ (HA) │ │
│ └──────┬───────┘ └───────────┘ │
│ │ │
│ ┌──────▼────────────┐ │
│ │ Integration GW │ (2-10 pods) │
│ │ :8080 (REST) │ │
│ └────────────────────┘ │
└─────────────────────────────────────────────┘
Dialog ConfigMap
Store dialog definitions in a ConfigMap:
# Create ConfigMap from dialog files
kubectl create configmap voice-gateway-dialogs \
--from-file=/path/to/dialogs/ \
--namespace voice-gateway
Or declaratively:
apiVersion: v1
kind: ConfigMap
metadata:
name: voice-gateway-dialogs
namespace: voice-gateway
data:
helpdesk.yaml: |
name: helpdesk
states:
start:
on_enter:
- action: play_tts
text: "Welcome to IT support."
transitions:
- event: speech
target: process
# ... more dialogs
GPU Scheduling
NVIDIA GPU Operator
Install the NVIDIA GPU Operator if not already present:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--create-namespace
GPU Node Labels
Ensure GPU nodes are labeled:
kubectl label nodes gpu-node-1 nvidia.com/gpu.present=true
Multiple GPU Types
If you have different GPU types, use node affinity:
speechGateway:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- Tesla-T4
- NVIDIA-A100-SXM4-40GB
Network Configuration
SIP Load Balancing
SIP over UDP requires special load balancing considerations. Use hostNetwork: true on Media Gateway pods and a UDP-capable load balancer:
mediaGateway:
hostNetwork: true
service:
type: LoadBalancer
annotations:
# AWS NLB
service.beta.kubernetes.io/aws-load-balancer-type: nlb
# GCP
cloud.google.com/l4-rbs: "enabled"
Network Policies
When security.networkPolicy.enabled: true, the Helm chart creates NetworkPolicies that restrict traffic:
- Media Gateway accepts SIP/RTP from external
- Speech Gateway only accepts from Media Gateway
- Runtime only accepts from Speech Gateway
- Integration Gateway only accepts from Runtime and external API clients
Scaling Guidelines
| Concurrent Calls | Media GW Pods | Speech GW Pods (GPU) | Runtime Pods | Integration Pods |
|---|---|---|---|---|
| 10 | 2 | 1 | 2 | 2 |
| 50 | 3 | 2 | 4 | 3 |
| 100 | 5 | 3 | 8 | 5 |
| 500 | 10 | 5 | 20 | 10 |
Next Steps
- Air-Gapped Deployment — deploy without internet
- Security — mTLS, RBAC, and audit logging
- Observability — Prometheus and OpenTelemetry setup