Overview
Deepgram provides three TTS service implementations:DeepgramTTSServicefor real-time streaming synthesis using Deepgram’s WebSocket API with support for interruptions and ultra-low latencyDeepgramHttpTTSServicefor batch synthesis using Deepgram’s HTTP APIDeepgramSageMakerTTSServicefor real-time synthesis using Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming
Deepgram TTS API Reference
Pipecat’s API methods for Deepgram TTS integration
Example Implementation
Complete example with Silero VAD
SageMaker Example
Complete example with Deepgram on SageMaker
Deepgram Documentation
Official Deepgram Aura TTS API documentation
Voice Models
Browse available Aura voice models
Installation
To use Deepgram TTS services, install the required dependencies:Prerequisites
Deepgram Account Setup
Before usingDeepgramTTSService or DeepgramHttpTTSService, you need:
- Deepgram Account: Sign up at Deepgram Console
- API Key: Generate an API key from your project dashboard
- Voice Selection: Choose from available Aura voice models
Required Environment Variables
DEEPGRAM_API_KEY: Your Deepgram API key for authentication
AWS SageMaker Setup
Before usingDeepgramSageMakerTTSService, you need:
- AWS Account: With credentials configured (via environment variables, AWS CLI, or instance metadata)
- SageMaker Endpoint: A deployed SageMaker endpoint with a Deepgram TTS model
- Voice Selection: Choose from available Aura voice models
Configuration
DeepgramTTSService
Deepgram API key for authentication.
Voice model to use for synthesis.
WebSocket base URL for Deepgram API.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Audio encoding format. Must be one of:
"linear16", "mulaw", "alaw".DeepgramHttpTTSService
Deepgram API key for authentication.
Voice model to use for synthesis.
An aiohttp session for HTTP requests. You must create and manage this
yourself.
HTTP API base URL.
Output audio sample rate in Hz.
Audio encoding format.
DeepgramSageMakerTTSService
Name of the SageMaker endpoint with Deepgram TTS model deployed.
AWS region where the SageMaker endpoint is deployed (e.g.,
"us-east-2").Voice model to use for synthesis.
Output audio sample rate in Hz. When
None, uses the pipeline’s configured
sample rate.Audio encoding format.
Usage
Basic Setup
HTTP Service
SageMaker Service
Notes
- WebSocket vs HTTP vs SageMaker: The WebSocket service (
DeepgramTTSService) and SageMaker service (DeepgramSageMakerTTSService) both support real-time streaming with interruption handling, making them suitable for interactive conversations. The HTTP service (DeepgramHttpTTSService) is simpler but processes each request as a batch. - Flush behavior: The WebSocket and SageMaker services automatically flush pending text when they receive an
LLMFullResponseEndFrameorEndFrame, forcing Deepgram to generate audio for any remaining buffered text. - Encoding validation: The WebSocket service validates the
encodingparameter at initialization and raises aValueErrorfor unsupported formats. - SageMaker deployment: The SageMaker service requires a Deepgram TTS model deployed to an AWS SageMaker endpoint. See the Deepgram SageMaker deployment guide for setup instructions.
Event Handlers
The WebSocket and SageMaker services support the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Deepgram (WebSocket or SageMaker) |
on_disconnected | Disconnected from Deepgram |
on_connection_error | Connection error occurred |