Skip to content

Conversation

rickstaa
Copy link
Member

@rickstaa rickstaa commented Apr 12, 2025

This draft pull request adds integration for text-to-speech (TTS) models compatible with the transformers library's pipeline API. It enables the use of pretrained TTS models via the standard Transformers interface.

However, the implementation is not yet complete. Currently, it lacks support for [speaker embeddings](https://huggingface.co/docs/transformers/en/tasks/text-to-speech#speaker-embeddings), which limits the pipeline's customizability.

To support speaker embeddings, we have two main options:

  • Switch to a multipart request: This would be a breaking change and may require introducing a v2 version of the TTS pipeline.
  • Use a marshalled representation of the speaker embeddings: This could maintain compatibility but may come with its own trade-offs.

This commit adds integration for text-to-speech (TTS) models compatible with
the `transformers` library's `pipeline` API. This enables the use of pretrained
TTS models via the standard Transformers interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant