Skip to main content

Audio Codecs

Telepath supports two audio codecs for SIP transport. The choice between them affects audio quality, bandwidth, and compatibility.

G.711 (Narrowband)

Technical Specifications:
  • Sampling Rate: 8 kHz
  • Bit Rate: 64 kbps (64 bits per sample)
  • Bandwidth: 8 kHz
  • Latency: ~1-2ms encoding
  • Codec Variants:
    • PCMU (µ-law) - Common in North America
    • PCMA (A-law) - Common in Europe and Asia
Characteristics:
  • Oldest and most widely supported codec
  • Universal compatibility with all carriers
  • Smaller file size
  • Acceptable voice quality for traditional telephony
  • No quality loss (non-compressed)
Best For:
  • Legacy carriers with limited codec support
  • Cost-sensitive applications
  • Regions where PCMU/PCMA is standard
  • When bandwidth is extremely limited
Example Use Cases:
  • Public switched telephone network (PSTN)
  • Legacy PBX systems
  • International calls through traditional carriers

G.722 (Wideband/HD Voice)

Technical Specifications:
  • Sampling Rate: 16 kHz
  • Bit Rate: 64 kbps (32 bits per sample, compressed)
  • Bandwidth: 16 kHz (super-wideband)
  • Latency: ~1-2ms encoding
  • Compression: ADPCM (Adaptive Differential PCM)
Characteristics:
  • Double the frequency range of G.711
  • Noticeably higher audio quality
  • Same bandwidth as G.711 (64kbps)
  • Better for AI processing
  • More details preserved in voice
Benefits:
  • Clearer, more natural sound
  • AI agents can better understand speech
  • Reduced background noise
  • Better speaker recognition
Best For:
  • AI voice agent applications ⭐ (Recommended)
  • Modern SIP providers
  • Applications prioritizing audio quality
  • When compatibility allows
Example Use Cases:
  • OpenAI Realtime API (benefits from wider frequency range)
  • ElevenLabs conversational AI
  • Custom AI implementations
We recommend G.722 for all AI voice applications. The wider frequency range helps AI models understand speech better while using the same bandwidth as G.711.

Codec Selection Guide

By Carrier

CarrierRecommendedFallback
TwilioG.722G.711 PCMU
TelnyxG.722G.711 PCMU
VonageG.722G.711 PCMU
BandwidthG.711 PCMUG.722
SignalWireG.722G.711 PCMU
PlivoG.722G.711 PCMU

By AI Provider

AI ProviderRecommendedWhy
OpenAI RealtimeG.722Wider frequency range aids speech recognition
ElevenLabsG.722Clearer input improves response quality
Custom WebSocketG.722Better for most AI models

Codec Negotiation

Carriers typically support codec lists in priority order:
Preferred: G.722, G.711 PCMU, G.711 PCMA
If G.722 is unavailable, the carrier will fall back to G.711. In Telepath:
  • Automatically negotiates best codec
  • Falls back gracefully if unavailable
  • No manual codec selection needed

Voice Activity Detection (VAD)

VAD automatically detects when someone is speaking and handles silence intelligently.

What is VAD?

Voice Activity Detection:
  1. Analyzes audio in real-time
  2. Distinguishes speech from silence/background noise
  3. Controls AI agent behavior based on detection
  4. Enables natural interruption handling

Benefits

Natural Conversation Flow:
  • AI knows when to start listening
  • Detects when caller has finished speaking
  • Handles natural pauses appropriately
Natural Interruptions:
  • Caller can interrupt the AI agent
  • AI recognizes when caller starts speaking
  • Seamless barge-in support
Reduced Latency:
  • No need to wait for fixed timeouts
  • Real-time detection of turn boundaries
  • Faster response times
Background Noise Filtering:
  • Distinguishes speech from noise
  • Better audio quality to AI agent

How Telepath’s VAD Works

Outbound VAD (AI agent speaking)

  1. Silence Detection: AI finishes speaking
  2. End-of-Turn Detection: VAD identifies when AI is done
  3. Caller’s Turn: System switches to listening
  4. Caller Speech: VAD detects incoming audio
  5. AI Processing: Audio sent to agent
Adaptive Sensitivity:
  • Adjusts to background noise levels
  • Learns from conversation patterns
  • Handles various environments (quiet offices, noisy call centers)

Inbound VAD (Caller speaking)

  1. Speech Detection: Caller speaks (VAD detects)
  2. Audio Collection: Collected in real-time
  3. Interruption Detection: If AI is speaking and caller starts…
  4. Barge-In: Audio forwarded to AI immediately
  5. AI Processing: Agent handles interruption

VAD Parameters

Telepath uses intelligent defaults, but you can fine-tune behavior: End-of-Turn Timeout:
  • Default: 800ms of silence
  • Adjustable: 400ms - 2000ms
  • Lower: More aggressive (interrupts sooner)
  • Higher: More patient (allows natural pauses)
Speech Start Threshold:
  • Default: Automatic
  • Effect: How quickly VAD detects speech start
Noise Level Adaptation:
  • Default: Enabled
  • Effect: Adjusts sensitivity to environment

Configuring VAD

Via Dashboard

  1. Open your connection settings
  2. Go to AdvancedVAD Configuration
  3. Adjust parameters:
    • End-of-turn timeout
    • Sensitivity level
    • Noise adaptation
  4. Save and test

Via API

{
  "vad_config": {
    "end_of_turn_timeout_ms": 800,
    "sensitivity": "adaptive",
    "noise_adaptation": true,
    "min_speech_duration_ms": 100
  }
}

Testing VAD Settings

Test Natural Pauses:
  1. Call your agent
  2. Speak, then pause for 1-2 seconds
  3. Observe if agent responds appropriately
  4. Adjust if needed
Test Interruptions:
  1. Let AI agent speak
  2. Interrupt mid-sentence
  3. Verify agent immediately receives your speech
  4. Ensure smooth barge-in
Test Noise Handling:
  1. Call from noisy environment
  2. Verify agent can still understand
  3. Check for unwanted interruptions
  4. Adjust sensitivity if needed

Audio Quality Optimization

Best Practices

Network:
  • Use wired connections when possible
  • Monitor packet loss (<1%)
  • Reduce jitter (<50ms)
Carrier Configuration:
  • Enable G.722 if available
  • Use UDP or TLS (both fine)
  • Optimize for your region
AI Agent:
  • Use latest model versions
  • Keep API credentials current
  • Test with various speakers
Monitoring:
  • Check dashboard for codec used
  • Monitor audio quality metrics
  • Review VAD decisions in SIP traces

Troubleshooting Audio Issues

Poor Clarity:
  1. Check which codec is in use
  2. Try switching to G.722 if not used
  3. Verify AI provider credentials
  4. Test with different phone models
Frequent Interruptions:
  1. VAD sensitivity too high
  2. Increase end-of-turn timeout
  3. Enable noise adaptation
  4. Test in less noisy environment
Delayed Responses:
  1. Check AI provider latency
  2. Verify codec negotiation
  3. Check network conditions
  4. Review carrier-side metrics
Background Noise Issues:
  1. Enable VAD noise adaptation
  2. Test from cleaner environment
  3. Adjust sensitivity thresholds
  4. Try different microphone

Advanced Codec Topics

Custom Codecs

For advanced deployments with specific requirements:

Advanced Integration

Custom codec handling and edge cases

Codec Transcoding

If your carrier only supports G.711 but you want G.722’s benefits: Option 1: Telepath transcodes (minimal latency impact) Option 2: Request carrier to enable G.722 Option 3: Use different carrier with G.722 support Transcoding adds ~5-10ms latency but preserves quality benefits.

Performance Metrics

Monitor codec performance in the dashboard:
  • Codec Used: Which codec actually negotiated
  • Packet Loss: % of lost packets per codec
  • Jitter: Audio timing variance
  • Quality Metrics: MOS score (Mean Opinion Score)

FAQ

Should I always use G.722? Yes, if your carrier supports it. Use G.711 only if required for compatibility. Can I change codec mid-call? No, codec is negotiated at call start. To change, hang up and reconnect. How does VAD handle music? Adaptive VAD learns conversation patterns and handles music appropriately. What if VAD is too aggressive? Increase the end-of-turn timeout to allow longer pauses. Can I disable VAD? VAD is essential for natural conversation. Disabling is not recommended. What audio formats does Telepath support internally? PCM 16-bit, 8kHz or 16kHz. Codecs handle conversion.