Audio Codecs
Telepath supports two audio codecs for SIP transport. The choice between them affects audio quality, bandwidth, and compatibility.
G.711 (Narrowband)
Technical Specifications :
Sampling Rate : 8 kHz
Bit Rate : 64 kbps (64 bits per sample)
Bandwidth : 8 kHz
Latency : ~1-2ms encoding
Codec Variants :
PCMU (µ-law) - Common in North America
PCMA (A-law) - Common in Europe and Asia
Characteristics :
Oldest and most widely supported codec
Universal compatibility with all carriers
Smaller file size
Acceptable voice quality for traditional telephony
No quality loss (non-compressed)
Best For :
Legacy carriers with limited codec support
Cost-sensitive applications
Regions where PCMU/PCMA is standard
When bandwidth is extremely limited
Example Use Cases :
Public switched telephone network (PSTN)
Legacy PBX systems
International calls through traditional carriers
G.722 (Wideband/HD Voice)
Technical Specifications :
Sampling Rate : 16 kHz
Bit Rate : 64 kbps (32 bits per sample, compressed)
Bandwidth : 16 kHz (super-wideband)
Latency : ~1-2ms encoding
Compression : ADPCM (Adaptive Differential PCM)
Characteristics :
Double the frequency range of G.711
Noticeably higher audio quality
Same bandwidth as G.711 (64kbps)
Better for AI processing
More details preserved in voice
Benefits :
Clearer, more natural sound
AI agents can better understand speech
Reduced background noise
Better speaker recognition
Best For :
AI voice agent applications ⭐ (Recommended)
Modern SIP providers
Applications prioritizing audio quality
When compatibility allows
Example Use Cases :
OpenAI Realtime API (benefits from wider frequency range)
ElevenLabs conversational AI
Custom AI implementations
We recommend G.722 for all AI voice applications . The wider frequency range helps AI models understand speech better while using the same bandwidth as G.711.
Codec Selection Guide
By Carrier
Carrier Recommended Fallback Twilio G.722 G.711 PCMU Telnyx G.722 G.711 PCMU Vonage G.722 G.711 PCMU Bandwidth G.711 PCMU G.722 SignalWire G.722 G.711 PCMU Plivo G.722 G.711 PCMU
By AI Provider
AI Provider Recommended Why OpenAI Realtime G.722 Wider frequency range aids speech recognition ElevenLabs G.722 Clearer input improves response quality Custom WebSocket G.722 Better for most AI models
Codec Negotiation
Carriers typically support codec lists in priority order:
Preferred: G.722, G.711 PCMU, G.711 PCMA
If G.722 is unavailable, the carrier will fall back to G.711.
In Telepath :
Automatically negotiates best codec
Falls back gracefully if unavailable
No manual codec selection needed
Voice Activity Detection (VAD)
VAD automatically detects when someone is speaking and handles silence intelligently.
What is VAD?
Voice Activity Detection:
Analyzes audio in real-time
Distinguishes speech from silence/background noise
Controls AI agent behavior based on detection
Enables natural interruption handling
Benefits
Natural Conversation Flow :
AI knows when to start listening
Detects when caller has finished speaking
Handles natural pauses appropriately
Natural Interruptions :
Caller can interrupt the AI agent
AI recognizes when caller starts speaking
Seamless barge-in support
Reduced Latency :
No need to wait for fixed timeouts
Real-time detection of turn boundaries
Faster response times
Background Noise Filtering :
Distinguishes speech from noise
Better audio quality to AI agent
How Telepath’s VAD Works
Outbound VAD (AI agent speaking)
Silence Detection : AI finishes speaking
End-of-Turn Detection : VAD identifies when AI is done
Caller’s Turn : System switches to listening
Caller Speech : VAD detects incoming audio
AI Processing : Audio sent to agent
Adaptive Sensitivity :
Adjusts to background noise levels
Learns from conversation patterns
Handles various environments (quiet offices, noisy call centers)
Inbound VAD (Caller speaking)
Speech Detection : Caller speaks (VAD detects)
Audio Collection : Collected in real-time
Interruption Detection : If AI is speaking and caller starts…
Barge-In : Audio forwarded to AI immediately
AI Processing : Agent handles interruption
VAD Parameters
Telepath uses intelligent defaults, but you can fine-tune behavior:
End-of-Turn Timeout :
Default : 800ms of silence
Adjustable : 400ms - 2000ms
Lower : More aggressive (interrupts sooner)
Higher : More patient (allows natural pauses)
Speech Start Threshold :
Default : Automatic
Effect : How quickly VAD detects speech start
Noise Level Adaptation :
Default : Enabled
Effect : Adjusts sensitivity to environment
Configuring VAD
Via Dashboard
Open your connection settings
Go to Advanced → VAD Configuration
Adjust parameters:
End-of-turn timeout
Sensitivity level
Noise adaptation
Save and test
Via API
{
"vad_config" : {
"end_of_turn_timeout_ms" : 800 ,
"sensitivity" : "adaptive" ,
"noise_adaptation" : true ,
"min_speech_duration_ms" : 100
}
}
Testing VAD Settings
Test Natural Pauses :
Call your agent
Speak, then pause for 1-2 seconds
Observe if agent responds appropriately
Adjust if needed
Test Interruptions :
Let AI agent speak
Interrupt mid-sentence
Verify agent immediately receives your speech
Ensure smooth barge-in
Test Noise Handling :
Call from noisy environment
Verify agent can still understand
Check for unwanted interruptions
Adjust sensitivity if needed
Audio Quality Optimization
Best Practices
Network :
Use wired connections when possible
Monitor packet loss (<1%)
Reduce jitter (<50ms)
Carrier Configuration :
Enable G.722 if available
Use UDP or TLS (both fine)
Optimize for your region
AI Agent :
Use latest model versions
Keep API credentials current
Test with various speakers
Monitoring :
Check dashboard for codec used
Monitor audio quality metrics
Review VAD decisions in SIP traces
Troubleshooting Audio Issues
Poor Clarity :
Check which codec is in use
Try switching to G.722 if not used
Verify AI provider credentials
Test with different phone models
Frequent Interruptions :
VAD sensitivity too high
Increase end-of-turn timeout
Enable noise adaptation
Test in less noisy environment
Delayed Responses :
Check AI provider latency
Verify codec negotiation
Check network conditions
Review carrier-side metrics
Background Noise Issues :
Enable VAD noise adaptation
Test from cleaner environment
Adjust sensitivity thresholds
Try different microphone
Advanced Codec Topics
Custom Codecs
For advanced deployments with specific requirements:
Advanced Integration Custom codec handling and edge cases
Codec Transcoding
If your carrier only supports G.711 but you want G.722’s benefits:
Option 1 : Telepath transcodes (minimal latency impact)
Option 2 : Request carrier to enable G.722
Option 3 : Use different carrier with G.722 support
Transcoding adds ~5-10ms latency but preserves quality benefits.
Monitor codec performance in the dashboard:
Codec Used : Which codec actually negotiated
Packet Loss : % of lost packets per codec
Jitter : Audio timing variance
Quality Metrics : MOS score (Mean Opinion Score)
FAQ
Should I always use G.722?
Yes, if your carrier supports it. Use G.711 only if required for compatibility.
Can I change codec mid-call?
No, codec is negotiated at call start. To change, hang up and reconnect.
How does VAD handle music?
Adaptive VAD learns conversation patterns and handles music appropriately.
What if VAD is too aggressive?
Increase the end-of-turn timeout to allow longer pauses.
Can I disable VAD?
VAD is essential for natural conversation. Disabling is not recommended.
What audio formats does Telepath support internally?
PCM 16-bit, 8kHz or 16kHz. Codecs handle conversion.