Releases: pipecat-ai/pipecat
v0.0.41
v0.0.40
Added
-
VAD parameters can now be dynamicallt updated using the
VADParamsUpdateFrame
. -
ErrorFrame
has now afatal
field to indicate the bot should exit if a fatal error is pushed upstream (false by default). A newFatalErrorFrame
that sets this flag to true has been added. -
AnthropicLLMService
now supports function calling and initial support for prompt caching.
(see https://www.anthropic.com/news/prompt-caching) -
ElevenLabsTTSService
can now specify ElevenLabs input parameters such asoutput_format
. -
TwilioFrameSerializer
can now specify Twilio's and Pipecat's desired sample rates to use. -
Added new
on_participant_updated
event toDailyTransport
. -
Added
DailyRESTHelper.delete_room_by_name()
andDailyRESTHelper.delete_room_by_url()
. -
Added LLM and TTS usage metrics. Those are enabled when
PipelineParams.enable_usage_metrics
is True. -
AudioRawFrame
s are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. -
Added new
GStreamerPipelineSource
. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). -
Added
TransportParams.audio_out_is_live
. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. -
Added new
BotStartedSpeakingFrame
andBotStoppedSpeakingFrame
control frames. These frames are pushed upstream and they should wrapBotSpeakingFrame
. -
Transports now allow you to register event handlers without decorators.
Changed
-
Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff.
(see https://docs.rtvi.ai/) -
SileroVAD
dependency is now imported via pip'ssilero-vad
package. -
ElevenLabsTTSService
now useseleven_turbo_v2_5
model by default. -
BotSpeakingFrame
is now a control frame. -
StartFrame
is now a control frame similar toEndFrame
. -
DeepgramTTSService
now is more customizable. You can adjust the encoding and sample rate.
Fixed
-
TTSStartFrame
andTTSStopFrame
are now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). -
Fixed
AzureSTTService
transcription frame timestamps. -
Fixed an issue with
DailyRESTHelper.create_room()
expirations which would cause this function to stop working after the initial expiration elapsed. -
Improved
EndFrame
andCancelFrame
handling.EndFrame
should end things gracefully while aCancelFrame
should cancel all running tasks as soon as possible. -
Fixed an issue in
AIService
that would cause a yieldedNone
value to be processed. -
RTVI's
bot-ready
message is now sent when the RTVI pipeline is ready and a first participant joins. -
Fixed a
BaseInputTransport
issue that was causing incoming system frames to be queued instead of being pushed immediately. -
Fixed a
BaseInputTransport
issue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.
Other
-
Added
studypal
example (from to the Cartesia folks!). -
Most examples now use Cartesia.
-
Added examples
foundational/19a-tools-anthropic.py
,foundational/19b-tools-video-anthropic.py
andfoundational/19a-tools-togetherai.py
. -
Added examples
foundational/18-gstreamer-filesrc.py
andfoundational/18a-gstreamer-videotestsrc.py
that show how to useGStreamerPipelineSource
. -
Remove
requests
library usage. -
Cleanup examples and use
DailyRESTHelper
.
v0.0.39
Fixed
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.
v0.0.38
Added
-
Added
force_reload
,skip_validation
andtrust_repo
toSileroVAD
andSileroVADAnalyzer
. This allows caching and various GitHub repo validations. -
Added
send_initial_empty_metrics
flag toPipelineParams
to request for initial empty metrics (zero values). True by default.
Fixed
-
Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.
-
STT services should be using ISO 8601 time format for transcription frames.
-
Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.
v0.0.37
Added
-
Added
RTVIProcessor
which implements the RTVI-AI standard.
See https://github.com/rtvi-ai -
Added
BotInterruptionFrame
which allows interrupting the bot while talking. -
Added
LLMMessagesAppendFrame
which allows appending messages to the current LLM context. -
Added
LLMMessagesUpdateFrame
which allows changing the LLM context for the one provided in this new frame. -
Added
LLMModelUpdateFrame
which allows updating the LLM model. -
Added
TTSSpeakFrame
which causes the bot say some text. This text will not be part of the LLM context. -
Added
TTSVoiceUpdateFrame
which allows updating the TTS voice.
Removed
- We remove the
LLMResponseStartFrame
andLLMResponseEndFrame
frames. These were added in the past to properly handle interruptions for theLLMAssistantContextAggregator
. But theLLMContextAggregator
is now based onLLMResponseAggregator
which handles interruptions properly by just processing theStartInterruptionFrame
, so there's no need for these extra frames any more.
Fixed
-
Fixed an issue with
StatelessTextTransformer
where it was pushing a string instead of aTextFrame
. -
TTSService
end of sentence detection has been improved. It now works with acronyms, numbers, hours and others. -
Fixed an issue in
TTSService
that would not properly flush the current aggregated sentence if anLLMFullResponseEndFrame
was found.
Performance
CartesiaTTSService
now uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.
v0.0.36
Added
-
Added
GladiaSTTService
. https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition -
Added
XTTSService
. This is a local Text-To-Speech service. https://github.com/coqui-ai/TTS -
Added
UserIdleProcessor
. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. -
Added
IdleFrameProcessor
. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. -
Added new frame
BotSpeakingFrame
. This frame will be continuously pushed upstream while the bot is talking. -
It is now possible to specify a Silero VAD version when using
SileroVADAnalyzer
orSileroVAD
. -
Added
AysncFrameProcessor
andAsyncAIService
. Some services likeDeepgramSTTService
need to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's whatAsyncFrameProcessor
is for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. -
The
MetricsFrame
now includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.
Changed
-
WhisperSTTService
model can now also be a string. -
Added missing * keyword separators in services.
Fixed
-
WebsocketServerTransport
doesn't try to send frames anymore if serializers returnsNone
. -
Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.
-
Fixed an issue in
FastAPIWebsocketTransport
where it would still try to send data to the websocket after being closed.
Other
-
Added Fly.io deployment example in
examples/deployment/flyio-example
. -
Added new
17-detect-user-idle.py
example that shows how to use the newUserIdleProcessor
.
v0.0.35
Changed
-
FastAPIWebsocketParams
now require a serializer. -
TwilioFrameSerializer
now requires astreamSid
.
Fixed
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.
v0.0.34
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.
-
Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.
v0.0.33
Changed
- Upgraded to Cartesia's new Python library 1.0.0.
CartesiaTTSService
now expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audiosample_rate
andencoding
instead of the previousoutput_format
.
Fixed
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.
-
Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.
-
Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.
v0.0.32
Added
-
Allow specifying a
DeepgramSTTService
url which allows using on-prem Deepgram. -
Added new
FastAPIWebsocketTransport
. This is a new websocket transport that can be integrated with FastAPI websockets. -
Added new
TwilioFrameSerializer
. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. -
Added Daily transport event:
on_dialout_answered
. See https://reference-python.daily.co/api_reference.html#daily.EventHandler -
Added new
AzureSTTService
. This allows you to use Azure Speech-To-Text.
Performance
- Convert
BaseOutputTransport
andBaseOutputTransport
to fully use asyncio and remove the use of threads.
Other
-
Added
twilio-chatbot
. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. -
Updated
07f-interruptible-azure.py
to useAzureLLMService
,AzureSTTService
andAzureTTSService
.