Overview
Deepgram provides three STT service implementations:DeepgramSTTServicefor real-time speech recognition using Deepgram’s standard WebSocket API with support for interim results, language detection, and voice activity detection (VAD)DeepgramFluxSTTServicefor advanced conversational AI with Flux capabilities including intelligent turn detection, eager end-of-turn events, and enhanced speech processing for improved response timingDeepgramSageMakerSTTServicefor real-time speech recognition using Deepgram models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming
Since Deepgram Flux provides its own user turn start and end detection, you
should use
ExternalUserTurnStrategies to let Flux handle turn management.
See User Turn
Strategies for
configuration details.Deepgram STT API Reference
Pipecat’s API methods for standard Deepgram STT
Deepgram Flux API Reference
Pipecat’s API methods for Deepgram Flux STT
Standard STT Example
Complete example with standard Deepgram STT
Flux STT Example
Complete example with Deepgram Flux STT
SageMaker Example
Complete example with Deepgram on SageMaker
Deepgram Documentation
Official Deepgram documentation and features
Deepgram Console
Access API keys and transcription models
Installation
To use Deepgram STT services, install the required dependencies:Prerequisites
Deepgram Account Setup
Before usingDeepgramSTTService or DeepgramFluxSTTService, you need:
- Deepgram Account: Sign up at Deepgram Console
- API Key: Generate an API key from your console dashboard
- Model Selection: Choose from available transcription models and features
Required Environment Variables
DEEPGRAM_API_KEY: Your Deepgram API key for authentication
AWS SageMaker Setup
Before usingDeepgramSageMakerSTTService, you need:
- AWS Account: With credentials configured (via environment variables, AWS CLI, or instance metadata)
- SageMaker Endpoint: A deployed SageMaker endpoint with a Deepgram model
- Deepgram SDK: The Deepgram SDK is required for
LiveOptionsconfiguration
Configuration
DeepgramSTTService
Deepgram API key for authentication.
Custom Deepgram API base URL. Leave empty for the default endpoint.
Audio sample rate in Hz. When
None, uses the value from live_options or
the pipeline’s configured sample rate.Deepgram
LiveOptions for detailed configuration. When provided, these
settings are merged with the defaults. See Deepgram
LiveOptions
for available options.Additional Deepgram features to enable.
P99 latency from speech end to final transcript in seconds. Override for your
deployment.
LiveOptions are:
| Option | Default | Description |
|---|---|---|
encoding | "linear16" | Audio encoding format. |
language | Language.EN | Recognition language. |
model | "nova-3-general" | Deepgram model to use. |
channels | 1 | Number of audio channels. |
interim_results | True | Stream partial recognition results. |
smart_format | False | Apply smart formatting. |
punctuate | True | Add punctuation to transcripts. |
profanity_filter | True | Filter profanity from transcripts. |
vad_events | False | Enable Deepgram’s built-in VAD events (deprecated). |
DeepgramFluxSTTService
Deepgram API key for authentication.
WebSocket URL for the Deepgram Flux API.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Deepgram Flux model to use for transcription.
Audio encoding format required by the Flux API. Must be
"linear16".Configuration parameters for the Flux API. See Flux
InputParams below.
Whether the bot should be interrupted when Flux detects user speech.
Flux InputParams
Parameters passed via theparams constructor argument for DeepgramFluxSTTService.
| Parameter | Type | Default | Description |
|---|---|---|---|
eager_eot_threshold | float | None | EagerEndOfTurn threshold. Lower values trigger faster responses with more LLM calls; higher values are more conservative. None disables EagerEndOfTurn. |
eot_threshold | float | None | End-of-turn confidence threshold (default 0.7). Lower = faster turn endings. |
eot_timeout_ms | int | None | Time in ms after speech to finish a turn regardless of confidence (default 5000). |
keyterm | list | [] | Key terms to boost recognition accuracy for specialized terminology. |
mip_opt_out | bool | None | Opt out of Deepgram’s Model Improvement Program. |
tag | list | [] | Tags for request identification during usage reporting. |
min_confidence | float | None | Minimum average confidence required to produce a TranscriptionFrame. |
DeepgramSageMakerSTTService
Name of the SageMaker endpoint with Deepgram model deployed.
AWS region where the SageMaker endpoint is deployed (e.g.,
"us-east-2").Audio sample rate in Hz. When
None, uses the value from live_options or
the pipeline’s configured sample rate.Deepgram
LiveOptions for detailed configuration. When provided, these
settings are merged with the defaults. See Deepgram
LiveOptions
for available options.P99 latency from speech end to final transcript in seconds. Override for your
deployment.
LiveOptions for the SageMaker variant are:
| Option | Default | Description |
|---|---|---|
encoding | "linear16" | Audio encoding format. |
language | Language.EN | Recognition language. |
model | "nova-3" | Deepgram model to use. |
channels | 1 | Number of audio channels. |
interim_results | True | Stream partial recognition results. |
punctuate | True | Add punctuation to transcripts. |
Usage
Basic DeepgramSTTService
With Custom LiveOptions
DeepgramFluxSTTService
Flux with EagerEndOfTurn
SageMaker Service
Notes
- Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking,
DeepgramSTTServiceandDeepgramSageMakerSTTServicesend a finalize request to Deepgram for faster final transcript delivery. - Flux turn management:
DeepgramFluxSTTServiceprovides its own turn detection viaStartOfTurn/EndOfTurnevents and broadcastsUserStartedSpeakingFrame/UserStoppedSpeakingFramedirectly. UseExternalUserTurnStrategiesto avoid conflicting VAD-based turn management. - EagerEndOfTurn: In Flux, enabling
eager_eot_thresholdprovides faster response times by predicting end-of-turn before it is confirmed. EagerEndOfTurn transcripts are pushed asInterimTranscriptionFrames. If the user resumes speaking, aTurnResumedevent is fired. - Deprecated vad_events: The
vad_eventsoption in standardDeepgramSTTServiceis deprecated. Use Silero VAD instead. - SageMaker deployment: The SageMaker service requires a Deepgram model deployed to an AWS SageMaker endpoint. See the Deepgram SageMaker deployment guide for setup instructions.
- SageMaker keepalive: The SageMaker service automatically sends KeepAlive messages every 5 seconds to maintain the connection during periods of silence.
Event Handlers
All three services support the standard service connection events (on_connected, on_disconnected, on_connection_error). Additionally, DeepgramSTTService and DeepgramFluxSTTService provide service-specific events:
DeepgramSTTService
| Event | Description |
|---|---|
on_speech_started | Speech detected in the audio stream |
on_utterance_end | End of utterance detected by Deepgram |
DeepgramFluxSTTService
Deepgram Flux provides turn-level events for more granular conversation tracking:| Event | Description |
|---|---|
on_start_of_turn | Start of a new turn detected |
on_turn_resumed | A previously paused turn has resumed |
on_end_of_turn | End of turn detected |
on_eager_end_of_turn | Early end-of-turn prediction |
on_update | Transcript updated |
(service, transcript) where transcript is the current transcript text. The on_turn_resumed event receives only (service).