Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic-speech-recognition transcribe phonemes #1173

Open
IanSweeneyAC opened this issue Jan 29, 2025 · 1 comment
Open

automatic-speech-recognition transcribe phonemes #1173

IanSweeneyAC opened this issue Jan 29, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@IanSweeneyAC
Copy link

Feature request

from the pipeline api examples

Example: Transcribe English w/ word-level timestamps.

would be nice to be able to transcribe audio and also get phoneme level timestamps, something like

const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const output = await transcriber(url, { return_timestamps: ['word', 'phonemes'] });

Motivation

having phoneme level timestamps would allow browser processing of audio to drive lip sync of a 3D avatar. assuming phoneme data is available as a building block of words in speech recognition.

Your contribution

I can animate a 3D character, eg ready player me, in three.js from audio samples if phoneme timestamps are available

@IanSweeneyAC IanSweeneyAC added the enhancement New feature or request label Jan 29, 2025
@xenova
Copy link
Collaborator

xenova commented Feb 8, 2025

Unfortunately, I don't believe whisper has this functionality. However, you could use a separate library, like https://www.npmjs.com/package/phonemizer to extract phonemes from the input text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants