Skip to content

ros-ai/ros2_whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

59b2d73 · Dec 13, 2024
Dec 11, 2024
Nov 19, 2024
Dec 11, 2024
Dec 11, 2024
Dec 12, 2024
Dec 11, 2024
Dec 11, 2024
Dec 11, 2024
Dec 11, 2024
Aug 18, 2023
Dec 11, 2024
Dec 11, 2024

Repository files navigation

ROS 2 Whisper

ROS 2 inference for whisper.cpp.

Example

This example shows live transcription of first minute of the 6'th chapter in Harry Potter and the Philosopher's Stone from Audible:

harry_potter_sample

Build

mkdir -p ros-ai/src && cd ros-ai/src && \
git clone https://github.com/ros-ai/ros2_whisper.git && cd .. && \
colcon build --symlink-install --cmake-args -DGGML_CUDA=On --no-warn-unused-cli

Demos

Configure whisper parameters in whisper.yaml.

Whisper On Key

Run the inference action server (this will download models to $HOME/.cache/whisper.cpp):

ros2 launch whisper_bringup bringup.launch.py

Run a client node (activated on space bar press):

ros2 run whisper_demos whisper_on_key

Stream

Bringup whisper:

ros2 launch whisper_bringup bringup.launch.py

Launch the live transcription stream:

ros2 run whisper_demos stream

Parameters

To enable/disable inference, you can set the active parameter from the command line with:

ros2 param set /whisper/inference active false # false/true
  • Audio will still be saved in the buffer but whisper will not be run.

Available Actions

Action server under topic inference of type Inference.action.

  • The feedback message regularly publishes the actively changing portion of the transcript.

  • The final result contains stale and active portions from the start of the inference.

Published Topics

Topics of type AudioTranscript.msg on /whisper/transcript_stream, which contain the entire transcript (stale and active), are published on updates to the transcript.

Internally, the topic /whisper/tokens of type WhisperTokens.msg is used to transfer the model output between nodes.

Troubleshoot