Codec downstream task support: TTS #5763

jctian98 · 2024-04-25T21:46:29Z

What?

Support downstream task of codec project, specifically TTS.
Model architecture: Valle (https://arxiv.org/abs/2301.02111)

mergify · 2024-04-25T21:47:17Z

This pull request is now in conflict :(

ftshijt

Thanks for the great work! I've noted some comments in the PR. Seems that for current stage, the most important thing is to decide how the json format for multi-modal data would be organized. Could you give us some examples and design concept?

egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py

egs2/TEMPLATE/asr1/pyscripts/utils/make_speechlm_json.py

egs2/libritts/speechlm1/conf/train_multiscale.yaml

egs2/mini_an4/speechlm1/local/recover_wav.py

espnet2/speechlm/core_lm/builtin.py

ftshijt · 2024-04-26T06:46:40Z

espnet2/speechlm/definitions.py

+# previously can become incompatible. New tokens can be
+# added - there are enough slots
+
+special_tokens = [


Do we expect users to add their own tokens?

jctian98 · 2024-04-26T14:56:45Z

@ftshijt Thanks for the review. The code is very temporary at this moment. I would respond and clarify many concerns after the framework is complete.

Currently I only respond a small part of the comments :)

for more information, see https://pre-commit.ci

ftshijt

Many thanks for the implementation, I left a few minor comments. After your fix, we can merge the current effort~

egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py

ftshijt · 2024-05-16T06:00:28Z

egs2/TEMPLATE/asr1/pyscripts/utils/make_example_list_speechlm.py

+import logging
+import os
+import sys
+import json
+
+logging.basicConfig(
+ format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+ datefmt="%Y-%m-%d %H:%M:%S",
+ level=os.environ.get("LOGLEVEL", "INFO").upper(),
+ stream=sys.stdout,
+)
+logger = logging.getLogger("find all example list based on train_jsons")


Suggested change

import logging

import os

import sys

import json

logging.basicConfig(

format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",

datefmt="%Y-%m-%d %H:%M:%S",

level=os.environ.get("LOGLEVEL", "INFO").upper(),

stream=sys.stdout,

)

logger = logging.getLogger("find all example list based on train_jsons")

import os

import sys

import json

egs2/TEMPLATE/asr1/pyscripts/utils/make_speechlm_json.py

espnet2/speechlm/core_lm/abs_core_lm.py

espnet2/speechlm/core_lm/ar_multiscale.py

espnet2/speechlm/core_lm/valle.py

espnet2/train/preprocessor.py

Jinchuan Tian and others added 4 commits April 11, 2024 01:02

basic speechlm definition and multitask dataloader

a8ff258

make the dataloader correct

a0b7006

add multi-scale; check the training model

85681bf

incomplete version of valle

f3c57ca

mergify bot added conflicts ESPnet2 labels Apr 25, 2024

ftshijt reviewed Apr 26, 2024

View reviewed changes

ftshijt added New Features Codec labels Apr 26, 2024

ftshijt added this to the v.202405 milestone Apr 26, 2024

jctian98 added 18 commits April 26, 2024 14:22

update encodec tokenization

4d8fd97

bug fix

aa1375f

AR model training ready

4565b12

support valle training

5298c19

revise multiscale

ce4a8a6

add flashattention version control

69784f4

before start infer debug

46cddbb

work on inference postprocess

88da1bf

teacher force ready for AR

1544e5d

update sampling

179f799

fix architecture bug

516babd

update organization

c4edcb0

update config and valle model file

6a077ef

ar inference update

9cd0b95

update

f7189cf

valle inference draft

5d5cac5

apply black

514c42a

update ar inference but not very good

dfded52

jctian98 and others added 7 commits May 15, 2024 00:43

valle ready

6695be2

correct decodeing setup of ar_multiscale

72c18eb

fix

cf07453

add valle config fle

6a7822b

add docstring

7a84083

Update egs2/libritts/speechlm1/cmd.sh

9e79841

Merge branch 'codec' into codec_downstream

e4053e1

mergify bot removed the conflicts label May 16, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

b05ee6d

for more information, see https://pre-commit.ci

ftshijt reviewed May 16, 2024

View reviewed changes

ftshijt and others added 19 commits May 16, 2024 02:39

Update egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py

8ebd303

Update egs2/TEMPLATE/asr1/pyscripts/feats/dump_codec.py

2241df6

Update egs2/TEMPLATE/asr1/pyscripts/utils/make_speechlm_json.py

97cc86f

Update espnet2/speechlm/core_lm/abs_core_lm.py

6de1907

Update egs2/TEMPLATE/asr1/pyscripts/utils/make_token_list_speechlm.py

fc34ea2

Update egs2/TEMPLATE/asr1/scripts/feats/codec_tokenization.sh

e5054ff

clean up for merging

17968aa

update data format

c99859c

Update espnet2/speechlm/core_lm/ar_multiscale.py

5d2f00a

Update espnet2/speechlm/core_lm/valle.py

cf39c6a

Update espnet2/train/preprocessor.py

5627311

Update egs2/TEMPLATE/speechlm1/speechlm.sh

ddafad6

Update egs2/TEMPLATE/asr1/pyscripts/utils/make_token_list_speechlm.py

39329a7

Update egs2/TEMPLATE/asr1/pyscripts/utils/make_token_list_speechlm.py

615f31a

Update egs2/TEMPLATE/asr1/scripts/feats/codec_tokenization.sh

ad6bc0c

Update egs2/TEMPLATE/asr1/pyscripts/utils/make_token_list_speechlm.py

6f46918

Update egs2/TEMPLATE/speechlm1/setup.sh

5bd5835

Update espnet2/bin/speechlm_inference.py

d164013

solve conflicts with codec branch

7f1d052

ftshijt merged commit 5ab7abc into espnet:codec May 16, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codec downstream task support: TTS #5763

Codec downstream task support: TTS #5763

jctian98 commented Apr 25, 2024

mergify bot commented Apr 25, 2024

ftshijt left a comment

ftshijt Apr 26, 2024

jctian98 commented Apr 26, 2024

ftshijt left a comment

ftshijt May 16, 2024

Codec downstream task support: TTS #5763

Codec downstream task support: TTS #5763

Conversation

jctian98 commented Apr 25, 2024

What?

mergify bot commented Apr 25, 2024

ftshijt left a comment

Choose a reason for hiding this comment

ftshijt Apr 26, 2024

Choose a reason for hiding this comment

jctian98 commented Apr 26, 2024

ftshijt left a comment

Choose a reason for hiding this comment

ftshijt May 16, 2024

Choose a reason for hiding this comment