Skip to content

Releases: pipecat-ai/pipecat

v0.0.41

10 Sep 01:06
e038767
Compare
Choose a tag to compare

Added

  • Added LivekitFrameSerializer audio frame serializer.

Fixed

  • Fix FastAPIWebsocketOutputTransport variable name clash with subclass.

  • Fix an AnthropicLLMService issue with empty arguments in function calling.

Other

  • Fixed studypal example errors.

v0.0.40

20 Aug 18:52
Compare
Choose a tag to compare

Added

  • VAD parameters can now be dynamicallt updated using the VADParamsUpdateFrame.

  • ErrorFrame has now a fatal field to indicate the bot should exit if a fatal error is pushed upstream (false by default). A new FatalErrorFrame that sets this flag to true has been added.

  • AnthropicLLMService now supports function calling and initial support for prompt caching.
    (see https://www.anthropic.com/news/prompt-caching)

  • ElevenLabsTTSService can now specify ElevenLabs input parameters such as output_format.

  • TwilioFrameSerializer can now specify Twilio's and Pipecat's desired sample rates to use.

  • Added new on_participant_updated event to DailyTransport.

  • Added DailyRESTHelper.delete_room_by_name() and DailyRESTHelper.delete_room_by_url().

  • Added LLM and TTS usage metrics. Those are enabled when PipelineParams.enable_usage_metrics is True.

  • AudioRawFrames are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline.

  • Added new GStreamerPipelineSource. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer).

  • Added TransportParams.audio_out_is_live. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images.

  • Added new BotStartedSpeakingFrame and BotStoppedSpeakingFrame control frames. These frames are pushed upstream and they should wrap BotSpeakingFrame.

  • Transports now allow you to register event handlers without decorators.

Changed

  • Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff.
    (see https://docs.rtvi.ai/)

  • SileroVAD dependency is now imported via pip's silero-vad package.

  • ElevenLabsTTSService now uses eleven_turbo_v2_5 model by default.

  • BotSpeakingFrame is now a control frame.

  • StartFrame is now a control frame similar to EndFrame.

  • DeepgramTTSService now is more customizable. You can adjust the encoding and sample rate.

Fixed

  • TTSStartFrame and TTSStopFrame are now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia).

  • Fixed AzureSTTService transcription frame timestamps.

  • Fixed an issue with DailyRESTHelper.create_room() expirations which would cause this function to stop working after the initial expiration elapsed.

  • Improved EndFrame and CancelFrame handling. EndFrame should end things gracefully while a CancelFrame should cancel all running tasks as soon as possible.

  • Fixed an issue in AIService that would cause a yielded None value to be processed.

  • RTVI's bot-ready message is now sent when the RTVI pipeline is ready and a first participant joins.

  • Fixed a BaseInputTransport issue that was causing incoming system frames to be queued instead of being pushed immediately.

  • Fixed a BaseInputTransport issue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.

Other

  • Added studypal example (from to the Cartesia folks!).

  • Most examples now use Cartesia.

  • Added examples foundational/19a-tools-anthropic.py, foundational/19b-tools-video-anthropic.py and foundational/19a-tools-togetherai.py.

  • Added examples foundational/18-gstreamer-filesrc.py and foundational/18a-gstreamer-videotestsrc.py that show how to use GStreamerPipelineSource.

  • Remove requests library usage.

  • Cleanup examples and use DailyRESTHelper.

v0.0.39

23 Jul 22:28
4b39309
Compare
Choose a tag to compare

Fixed

  • Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.

v0.0.38

23 Jul 21:28
Compare
Choose a tag to compare

Added

  • Added force_reload, skip_validation and trust_repo to SileroVAD and SileroVADAnalyzer. This allows caching and various GitHub repo validations.

  • Added send_initial_empty_metrics flag to PipelineParams to request for initial empty metrics (zero values). True by default.

Fixed

  • Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.

  • STT services should be using ISO 8601 time format for transcription frames.

  • Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.

v0.0.37

23 Jul 00:04
eb998aa
Compare
Choose a tag to compare

Added

  • Added RTVIProcessor which implements the RTVI-AI standard.
    See https://github.com/rtvi-ai

  • Added BotInterruptionFrame which allows interrupting the bot while talking.

  • Added LLMMessagesAppendFrame which allows appending messages to the current LLM context.

  • Added LLMMessagesUpdateFrame which allows changing the LLM context for the one provided in this new frame.

  • Added LLMModelUpdateFrame which allows updating the LLM model.

  • Added TTSSpeakFrame which causes the bot say some text. This text will not be part of the LLM context.

  • Added TTSVoiceUpdateFrame which allows updating the TTS voice.

Removed

  • We remove the LLMResponseStartFrame and LLMResponseEndFrame frames. These were added in the past to properly handle interruptions for the LLMAssistantContextAggregator. But the LLMContextAggregator is now based on LLMResponseAggregator which handles interruptions properly by just processing the StartInterruptionFrame, so there's no need for these extra frames any more.

Fixed

  • Fixed an issue with StatelessTextTransformer where it was pushing a string instead of a TextFrame.

  • TTSService end of sentence detection has been improved. It now works with acronyms, numbers, hours and others.

  • Fixed an issue in TTSService that would not properly flush the current aggregated sentence if an LLMFullResponseEndFrame was found.

Performance

  • CartesiaTTSService now uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.

v0.0.36

02 Jul 17:19
065cfb2
Compare
Choose a tag to compare

Added

  • Added GladiaSTTService. https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition

  • Added XTTSService. This is a local Text-To-Speech service. https://github.com/coqui-ai/TTS

  • Added UserIdleProcessor. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called.

  • Added IdleFrameProcessor. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called.

  • Added new frame BotSpeakingFrame. This frame will be continuously pushed upstream while the bot is talking.

  • It is now possible to specify a Silero VAD version when using SileroVADAnalyzer or SileroVAD.

  • Added AysncFrameProcessor and AsyncAIService. Some services like DeepgramSTTService need to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's what AsyncFrameProcessor is for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task.

  • The MetricsFrame now includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.

Changed

  • WhisperSTTService model can now also be a string.

  • Added missing * keyword separators in services.

Fixed

  • WebsocketServerTransport doesn't try to send frames anymore if serializers returns None.

  • Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.

  • Fixed an issue in FastAPIWebsocketTransport where it would still try to send data to the websocket after being closed.

Other

  • Added Fly.io deployment example in examples/deployment/flyio-example.

  • Added new 17-detect-user-idle.py example that shows how to use the new UserIdleProcessor.

v0.0.35

28 Jun 18:27
8dff460
Compare
Choose a tag to compare

Changed

  • FastAPIWebsocketParams now require a serializer.

  • TwilioFrameSerializer now requires a streamSid.

Fixed

  • Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.

v0.0.34

26 Jun 05:06
0ac4200
Compare
Choose a tag to compare

Fixed

  • Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.

  • Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.

v0.0.33

25 Jun 19:06
e3b407d
Compare
Choose a tag to compare

Changed

  • Upgraded to Cartesia's new Python library 1.0.0. CartesiaTTSService now expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audio sample_rate and encoding instead of the previous output_format.

Fixed

  • Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.

  • Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.

  • Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.

v0.0.32

22 Jun 16:23
269d06a
Compare
Choose a tag to compare

Added

  • Allow specifying a DeepgramSTTService url which allows using on-prem Deepgram.

  • Added new FastAPIWebsocketTransport. This is a new websocket transport that can be integrated with FastAPI websockets.

  • Added new TwilioFrameSerializer. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio.

  • Added Daily transport event: on_dialout_answered. See https://reference-python.daily.co/api_reference.html#daily.EventHandler

  • Added new AzureSTTService. This allows you to use Azure Speech-To-Text.

Performance

  • Convert BaseOutputTransport and BaseOutputTransport to fully use asyncio and remove the use of threads.

Other

  • Added twilio-chatbot. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot.

  • Updated 07f-interruptible-azure.py to use AzureLLMService, AzureSTTService and AzureTTSService.