You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
having phoneme level timestamps would allow browser processing of audio to drive lip sync of a 3D avatar. assuming phoneme data is available as a building block of words in speech recognition.
Your contribution
I can animate a 3D character, eg ready player me, in three.js from audio samples if phoneme timestamps are available
The text was updated successfully, but these errors were encountered:
Unfortunately, I don't believe whisper has this functionality. However, you could use a separate library, like https://www.npmjs.com/package/phonemizer to extract phonemes from the input text.
Feature request
from the pipeline api examples
Example: Transcribe English w/ word-level timestamps.
would be nice to be able to transcribe audio and also get phoneme level timestamps, something like
Motivation
having phoneme level timestamps would allow browser processing of audio to drive lip sync of a 3D avatar. assuming phoneme data is available as a building block of words in speech recognition.
Your contribution
I can animate a 3D character, eg ready player me, in three.js from audio samples if phoneme timestamps are available
The text was updated successfully, but these errors were encountered: