Kubernetes Deployment

The Kubernetes deployment model is designed for production workloads requiring high availability, autoscaling, and GPU-accelerated speech processing. voicetyped ships as a Helm chart with sensible defaults and extensive customization options.

Prerequisites

Kubernetes 1.26+
Helm 3.12+
PersistentVolume provisioner (for model storage)
Optional: NVIDIA GPU Operator (for GPU-accelerated ASR)
Optional: cert-manager (for automatic TLS certificate management)

Quick Start

# Add the voicetyped Helm repository
helm repo add voicetyped https://charts.voicetyped.com
helm repo update

# Install with defaults
helm install voice-gateway voicetyped/voice-gateway \
  --namespace voice-gateway \
  --create-namespace

# Install with custom values
helm install voice-gateway voicetyped/voice-gateway \
  --namespace voice-gateway \
  --create-namespace \
  -f values.yaml

Helm Values

Complete values.yaml

# values.yaml

# Global settings
global:
  image:
    repository: voicetyped/voice-gateway
    tag: "latest"
    pullPolicy: IfNotPresent

# ──────────────────────────────────
# Media Gateway
# ──────────────────────────────────
mediaGateway:
  replicas: 2
  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"
    limits:
      cpu: "2"
      memory: "1Gi"

  service:
    type: LoadBalancer              # or NodePort
    sipPort: 5060
    annotations: {}

  config:
    sipTransport: udp
    rtpPortRange: "10000-20000"
    codecs:
      - g711-ulaw
      - g711-alaw
      - opus
    jitterBufferMs: 60

  # Host networking required for RTP
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 70

# ──────────────────────────────────
# Speech Gateway
# ──────────────────────────────────
speechGateway:
  replicas: 1
  resources:
    requests:
      cpu: "2"
      memory: "4Gi"
    limits:
      cpu: "4"
      memory: "8Gi"
      nvidia.com/gpu: 1             # Request 1 GPU

  config:
    engine: whisper
    model: whisper-medium
    language: en
    maxWorkers: 4
    vad:
      threshold: 0.5
      minSilenceMs: 500

  # Model storage
  modelStorage:
    enabled: true
    storageClass: "standard"        # Your PV storage class
    size: 10Gi
    mountPath: /models

  # GPU scheduling
  gpu:
    enabled: true
    type: nvidia                    # nvidia or amd
    count: 1
  nodeSelector:
    nvidia.com/gpu.present: "true"
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
    targetGPUUtilization: 80

# ──────────────────────────────────
# Conversation Runtime
# ──────────────────────────────────
runtime:
  replicas: 2
  resources:
    requests:
      cpu: "250m"
      memory: "256Mi"
    limits:
      cpu: "1"
      memory: "512Mi"

  config:
    maxConcurrentCalls: 100
    defaultTimeout: 10s
    bargeIn: true
    stateStore: redis               # Use Redis for HA

  # Load dialog definitions from ConfigMap
  dialogConfigMap: voice-gateway-dialogs

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 20
    targetCPUUtilization: 60

# ──────────────────────────────────
# Integration Gateway
# ──────────────────────────────────
integration:
  replicas: 2
  resources:
    requests:
      cpu: "250m"
      memory: "256Mi"
    limits:
      cpu: "1"
      memory: "512Mi"

  service:
    apiPort: 8080

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 60

# ──────────────────────────────────
# Redis (for state store)
# ──────────────────────────────────
redis:
  enabled: true
  architecture: replication
  auth:
    enabled: true
    existingSecret: voice-gateway-redis
  replica:
    replicaCount: 3
  persistence:
    enabled: true
    size: 1Gi

# ──────────────────────────────────
# Observability
# ──────────────────────────────────
observability:
  metrics:
    enabled: true
    port: 9100
    serviceMonitor:
      enabled: true                 # Create Prometheus ServiceMonitor
      interval: 15s
  tracing:
    enabled: false
    otlpEndpoint: ""

# ──────────────────────────────────
# Security
# ──────────────────────────────────
security:
  mtls:
    enabled: true
    certManager:
      enabled: true                 # Use cert-manager for certificates
      issuerRef:
        name: voice-gateway-ca
        kind: ClusterIssuer
  networkPolicy:
    enabled: true

# ──────────────────────────────────
# Ingress (for HTTP API)
# ──────────────────────────────────
ingress:
  enabled: false
  className: nginx
  hosts:
    - host: vg-api.internal
      paths:
        - path: /
          pathType: Prefix
  tls: []

Architecture on Kubernetes

┌─────────────────────────────────────────────┐
│  Kubernetes Cluster                          │
│                                              │
│  ┌──────────────┐  ┌──────────────────────┐ │
│  │ LoadBalancer  │  │ ServiceMonitor       │ │
│  │ :5060 (SIP)  │  │ (Prometheus)         │ │
│  └──────┬───────┘  └──────────────────────┘ │
│         │                                    │
│  ┌──────▼───────┐                           │
│  │ Media Gateway │ (hostNetwork, 2-10 pods) │
│  └──────┬───────┘                           │
│         │                                    │
│  ┌──────▼───────────┐                       │
│  │ Speech Gateway    │ (GPU nodes, 1-5 pods)│
│  │ + PV (models)     │                      │
│  └──────┬───────────┘                       │
│         │                                    │
│  ┌──────▼───────┐  ┌───────────┐            │
│  │ Runtime       │──│ Redis     │            │
│  │ (2-20 pods)   │  │ (HA)      │            │
│  └──────┬───────┘  └───────────┘            │
│         │                                    │
│  ┌──────▼────────────┐                      │
│  │ Integration GW     │ (2-10 pods)         │
│  │ :8080 (REST)       │                     │
│  └────────────────────┘                     │
└─────────────────────────────────────────────┘

Dialog ConfigMap

Store dialog definitions in a ConfigMap:

# Create ConfigMap from dialog files
kubectl create configmap voice-gateway-dialogs \
  --from-file=/path/to/dialogs/ \
  --namespace voice-gateway

Or declaratively:

apiVersion: v1
kind: ConfigMap
metadata:
  name: voice-gateway-dialogs
  namespace: voice-gateway
data:
  helpdesk.yaml: |
    name: helpdesk
    states:
      start:
        on_enter:
          - action: play_tts
            text: "Welcome to IT support."
        transitions:
          - event: speech
            target: process
  # ... more dialogs

GPU Scheduling

NVIDIA GPU Operator

Install the NVIDIA GPU Operator if not already present:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace

GPU Node Labels

Ensure GPU nodes are labeled:

kubectl label nodes gpu-node-1 nvidia.com/gpu.present=true

Multiple GPU Types

If you have different GPU types, use node affinity:

speechGateway:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - Tesla-T4
                  - NVIDIA-A100-SXM4-40GB

Network Configuration

SIP Load Balancing

SIP over UDP requires special load balancing considerations. Use hostNetwork: true on Media Gateway pods and a UDP-capable load balancer:

mediaGateway:
  hostNetwork: true
  service:
    type: LoadBalancer
    annotations:
      # AWS NLB
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      # GCP
      cloud.google.com/l4-rbs: "enabled"

Network Policies

When security.networkPolicy.enabled: true, the Helm chart creates NetworkPolicies that restrict traffic:

Media Gateway accepts SIP/RTP from external
Speech Gateway only accepts from Media Gateway
Runtime only accepts from Speech Gateway
Integration Gateway only accepts from Runtime and external API clients

Scaling Guidelines

Concurrent Calls	Media GW Pods	Speech GW Pods (GPU)	Runtime Pods	Integration Pods
10	2	1	2	2
50	3	2	4	3
100	5	3	8	5
500	10	5	20	10

Next Steps

Air-Gapped Deployment — deploy without internet
Security — mTLS, RBAC, and audit logging
Observability — Prometheus and OpenTelemetry setup