Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec downstream task support: TTS #5763

Merged
merged 62 commits into from
May 16, 2024
Merged

Conversation

jctian98
Copy link
Contributor

What?

Support downstream task of codec project, specifically TTS.
Model architecture: Valle (https://arxiv.org/abs/2301.02111)

Copy link
Contributor

mergify bot commented Apr 25, 2024

This pull request is now in conflict :(

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! I've noted some comments in the PR. Seems that for current stage, the most important thing is to decide how the json format for multi-modal data would be organized. Could you give us some examples and design concept?

egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py Outdated Show resolved Hide resolved
egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py Outdated Show resolved Hide resolved
egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py Outdated Show resolved Hide resolved
egs2/libritts/speechlm1/conf/train_multiscale.yaml Outdated Show resolved Hide resolved
egs2/mini_an4/speechlm1/local/recover_wav.py Outdated Show resolved Hide resolved
espnet2/speechlm/core_lm/builtin.py Outdated Show resolved Hide resolved
# previously can become incompatible. New tokens can be
# added - there are enough slots

special_tokens = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect users to add their own tokens?

@ftshijt ftshijt added this to the v.202405 milestone Apr 26, 2024
@jctian98
Copy link
Contributor Author

@ftshijt Thanks for the review. The code is very temporary at this moment. I would respond and clarify many concerns after the framework is complete.

Currently I only respond a small part of the comments :)

@mergify mergify bot removed the conflicts label May 16, 2024
Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the implementation, I left a few minor comments. After your fix, we can merge the current effort~

egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py Outdated Show resolved Hide resolved
egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py Outdated Show resolved Hide resolved
Comment on lines 7 to 18
import logging
import os
import sys
import json

logging.basicConfig(
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=os.environ.get("LOGLEVEL", "INFO").upper(),
stream=sys.stdout,
)
logger = logging.getLogger("find all example list based on train_jsons")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import logging
import os
import sys
import json
logging.basicConfig(
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=os.environ.get("LOGLEVEL", "INFO").upper(),
stream=sys.stdout,
)
logger = logging.getLogger("find all example list based on train_jsons")
import os
import sys
import json

egs2/TEMPLATE/asr1/pyscripts/utils/make_speechlm_json.py Outdated Show resolved Hide resolved
egs2/TEMPLATE/asr1/pyscripts/utils/make_speechlm_json.py Outdated Show resolved Hide resolved
espnet2/speechlm/core_lm/abs_core_lm.py Outdated Show resolved Hide resolved
espnet2/speechlm/core_lm/abs_core_lm.py Outdated Show resolved Hide resolved
espnet2/speechlm/core_lm/ar_multiscale.py Outdated Show resolved Hide resolved
espnet2/speechlm/core_lm/valle.py Outdated Show resolved Hide resolved
espnet2/train/preprocessor.py Outdated Show resolved Hide resolved
@ftshijt ftshijt merged commit 5ab7abc into espnet:codec May 16, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants