Codec & VAD

Audio Codecs

Telepath supports two audio codecs for SIP transport. The choice between them affects audio quality, bandwidth, and compatibility.

G.711 (Narrowband)

Technical Specifications:

Sampling Rate: 8 kHz
Bit Rate: 64 kbps (64 bits per sample)
Bandwidth: 8 kHz
Latency: ~1-2ms encoding
Codec Variants:
- PCMU (µ-law) - Common in North America
- PCMA (A-law) - Common in Europe and Asia

Characteristics:

Oldest and most widely supported codec
Universal compatibility with all carriers
Smaller file size
Acceptable voice quality for traditional telephony
No quality loss (non-compressed)

Best For:

Legacy carriers with limited codec support
Cost-sensitive applications
Regions where PCMU/PCMA is standard
When bandwidth is extremely limited

Example Use Cases:

Public switched telephone network (PSTN)
Legacy PBX systems
International calls through traditional carriers

G.722 (Wideband/HD Voice)

Technical Specifications:

Sampling Rate: 16 kHz
Bit Rate: 64 kbps (32 bits per sample, compressed)
Bandwidth: 16 kHz (super-wideband)
Latency: ~1-2ms encoding
Compression: ADPCM (Adaptive Differential PCM)

Characteristics:

Double the frequency range of G.711
Noticeably higher audio quality
Same bandwidth as G.711 (64kbps)
Better for AI processing
More details preserved in voice

Benefits:

Clearer, more natural sound
AI agents can better understand speech
Reduced background noise
Better speaker recognition

Best For:

AI voice agent applications ⭐ (Recommended)
Modern SIP providers
Applications prioritizing audio quality
When compatibility allows

Example Use Cases:

OpenAI Realtime API (benefits from wider frequency range)
ElevenLabs conversational AI
Custom AI implementations

We recommend G.722 for all AI voice applications. The wider frequency range helps AI models understand speech better while using the same bandwidth as G.711.

Codec Selection Guide

By Carrier

Carrier	Recommended	Fallback
Twilio	G.722	G.711 PCMU
Telnyx	G.722	G.711 PCMU
Vonage	G.722	G.711 PCMU
Bandwidth	G.711 PCMU	G.722
SignalWire	G.722	G.711 PCMU
Plivo	G.722	G.711 PCMU

By AI Provider

AI Provider	Recommended	Why
OpenAI Realtime	G.722	Wider frequency range aids speech recognition
ElevenLabs	G.722	Clearer input improves response quality
Custom WebSocket	G.722	Better for most AI models

Codec Negotiation

Carriers typically support codec lists in priority order:

Preferred: G.722, G.711 PCMU, G.711 PCMA

If G.722 is unavailable, the carrier will fall back to G.711. In Telepath:

Automatically negotiates best codec
Falls back gracefully if unavailable
No manual codec selection needed

Voice Activity Detection (VAD)

VAD automatically detects when someone is speaking and handles silence intelligently.

What is VAD?

Voice Activity Detection:

Analyzes audio in real-time
Distinguishes speech from silence/background noise
Controls AI agent behavior based on detection
Enables natural interruption handling

Benefits

Natural Conversation Flow:

AI knows when to start listening
Detects when caller has finished speaking
Handles natural pauses appropriately

Natural Interruptions:

Caller can interrupt the AI agent
AI recognizes when caller starts speaking
Seamless barge-in support

Reduced Latency:

No need to wait for fixed timeouts
Real-time detection of turn boundaries
Faster response times

Background Noise Filtering:

Distinguishes speech from noise
Better audio quality to AI agent

How Telepath’s VAD Works

Outbound VAD (AI agent speaking)

Silence Detection: AI finishes speaking
End-of-Turn Detection: VAD identifies when AI is done
Caller’s Turn: System switches to listening
Caller Speech: VAD detects incoming audio
AI Processing: Audio sent to agent

Adaptive Sensitivity:

Adjusts to background noise levels
Learns from conversation patterns
Handles various environments (quiet offices, noisy call centers)

Inbound VAD (Caller speaking)

Speech Detection: Caller speaks (VAD detects)
Audio Collection: Collected in real-time
Interruption Detection: If AI is speaking and caller starts…
Barge-In: Audio forwarded to AI immediately
AI Processing: Agent handles interruption

VAD Parameters

Telepath uses intelligent defaults, but you can fine-tune behavior: End-of-Turn Timeout:

Default: 800ms of silence
Adjustable: 400ms - 2000ms
Lower: More aggressive (interrupts sooner)
Higher: More patient (allows natural pauses)

Speech Start Threshold:

Default: Automatic
Effect: How quickly VAD detects speech start

Noise Level Adaptation:

Default: Enabled
Effect: Adjusts sensitivity to environment

Configuring VAD

Via Dashboard

Open your connection settings
Go to Advanced → VAD Configuration
Adjust parameters:
- End-of-turn timeout
- Sensitivity level
- Noise adaptation
Save and test

Via API

{
  "vad_config": {
    "end_of_turn_timeout_ms": 800,
    "sensitivity": "adaptive",
    "noise_adaptation": true,
    "min_speech_duration_ms": 100
  }
}

Testing VAD Settings

Test Natural Pauses:

Call your agent
Speak, then pause for 1-2 seconds
Observe if agent responds appropriately
Adjust if needed

Test Interruptions:

Let AI agent speak
Interrupt mid-sentence
Verify agent immediately receives your speech
Ensure smooth barge-in

Test Noise Handling:

Call from noisy environment
Verify agent can still understand
Check for unwanted interruptions
Adjust sensitivity if needed

Audio Quality Optimization

Best Practices

Network:

Use wired connections when possible
Monitor packet loss (<1%)
Reduce jitter (<50ms)

Carrier Configuration:

Enable G.722 if available
Use UDP or TLS (both fine)
Optimize for your region

AI Agent:

Use latest model versions
Keep API credentials current
Test with various speakers

Monitoring:

Check dashboard for codec used
Monitor audio quality metrics
Review VAD decisions in SIP traces

Troubleshooting Audio Issues

Poor Clarity:

Check which codec is in use
Try switching to G.722 if not used
Verify AI provider credentials
Test with different phone models

Frequent Interruptions:

VAD sensitivity too high
Increase end-of-turn timeout
Enable noise adaptation
Test in less noisy environment

Delayed Responses:

Check AI provider latency
Verify codec negotiation
Check network conditions
Review carrier-side metrics

Background Noise Issues:

Enable VAD noise adaptation
Test from cleaner environment
Adjust sensitivity thresholds
Try different microphone

Advanced Codec Topics

Custom Codecs

For advanced deployments with specific requirements:

Advanced Integration

Custom codec handling and edge cases

Codec Transcoding

If your carrier only supports G.711 but you want G.722’s benefits: Option 1: Telepath transcodes (minimal latency impact) Option 2: Request carrier to enable G.722 Option 3: Use different carrier with G.722 support Transcoding adds ~5-10ms latency but preserves quality benefits.

Performance Metrics

Monitor codec performance in the dashboard:

Codec Used: Which codec actually negotiated
Packet Loss: % of lost packets per codec
Jitter: Audio timing variance
Quality Metrics: MOS score (Mean Opinion Score)

FAQ

Should I always use G.722? Yes, if your carrier supports it. Use G.711 only if required for compatibility. Can I change codec mid-call? No, codec is negotiated at call start. To change, hang up and reconnect. How does VAD handle music? Adaptive VAD learns conversation patterns and handles music appropriately. What if VAD is too aggressive? Increase the end-of-turn timeout to allow longer pauses. Can I disable VAD? VAD is essential for natural conversation. Disabling is not recommended. What audio formats does Telepath support internally? PCM 16-bit, 8kHz or 16kHz. Codecs handle conversion.

Getting started

Integration

Observability

Reference

Audio Codecs

G.711 (Narrowband)

G.722 (Wideband/HD Voice)

Codec Selection Guide

By Carrier

By AI Provider

Codec Negotiation

Voice Activity Detection (VAD)

What is VAD?

Benefits

How Telepath’s VAD Works

Outbound VAD (AI agent speaking)

Inbound VAD (Caller speaking)

VAD Parameters

Configuring VAD

Via Dashboard

Via API

Testing VAD Settings

Audio Quality Optimization

Best Practices

Troubleshooting Audio Issues

Advanced Codec Topics

Custom Codecs

Advanced Integration

Codec Transcoding

Performance Metrics

FAQ

Getting started

Integration

Observability

Reference

​Audio Codecs

​G.711 (Narrowband)

​G.722 (Wideband/HD Voice)

​Codec Selection Guide

​By Carrier

​By AI Provider

​Codec Negotiation

​Voice Activity Detection (VAD)

​What is VAD?

​Benefits

​How Telepath’s VAD Works

​Outbound VAD (AI agent speaking)

​Inbound VAD (Caller speaking)

​VAD Parameters

​Configuring VAD

​Via Dashboard

​Via API

​Testing VAD Settings

​Audio Quality Optimization

​Best Practices

​Troubleshooting Audio Issues

​Advanced Codec Topics

​Custom Codecs

Advanced Integration

​Codec Transcoding

​Performance Metrics

​FAQ

Audio Codecs

G.711 (Narrowband)

G.722 (Wideband/HD Voice)

Codec Selection Guide

By Carrier

By AI Provider

Codec Negotiation

Voice Activity Detection (VAD)

What is VAD?

Benefits

How Telepath’s VAD Works

Outbound VAD (AI agent speaking)

Inbound VAD (Caller speaking)

VAD Parameters

Configuring VAD

Via Dashboard

Via API

Testing VAD Settings

Audio Quality Optimization

Best Practices

Troubleshooting Audio Issues

Advanced Codec Topics

Custom Codecs

Codec Transcoding

Performance Metrics

FAQ