Why always Downloading the tokenizer of seamlessM4T_v2_large #409

Longleaves · 2024-04-09T07:31:06Z

I already set up CHECKPOINTS_PATH and cards, but why always Downloading the tokenizer of seamlessM4T_v2_large when I python app.py? Please help, thanks.

zrthxn · 2024-04-14T23:50:30Z

If I understand correctly, it looks like you're using snapshot_download.
If you just load the model or tokenizer directly, the cached files will be used once downloaded.

from seamless_communication.models.unity import (
    load_unity_model,
    load_unity_text_tokenizer,
    load_unity_unit_tokenizer
)

model = load_unity_model(model_name_or_card)
tokenizer = load_unity_unit_tokenizer(model_name_or_card)
tokenizer = load_unity_text_tokenizer(model_name_or_card)

Here model_name_or_card = "seamlessM4T_v2_large"

amirmfarzane · 2024-06-24T05:52:37Z

If I understand correctly, it looks like you're using snapshot_download. If you just load the model or tokenizer directly, the cached files will be used once downloaded.
from seamless_communication.models.unity import (
    load_unity_model,
    load_unity_text_tokenizer,
    load_unity_unit_tokenizer
)

model = load_unity_model(model_name_or_card)
tokenizer = load_unity_unit_tokenizer(model_name_or_card)
tokenizer = load_unity_text_tokenizer(model_name_or_card)
Here model_name_or_card = "seamlessM4T_v2_large"

How load checkpoints that i got from fine-tuning.

avidale · 2024-06-24T07:55:14Z

How load checkpoints that i got from fine-tuning.

You can start by loading the original model (e.g. seamlessM4T_v2_large) from its card, and then use the function load_checkpoint (src/seamless_communication/cli/m4t/evaluate/evaluate.py#L365) to update the model from your fine-tuned checkpoint.

Also, please take a look at the excellent note from Alisamar Husain about fine-tuning M4T models.

amirmfarzane · 2024-07-08T08:51:14Z

How load checkpoints that i got from fine-tuning.

You can start by loading the original model (e.g. seamlessM4T_v2_large) from its card, and then use the function load_checkpoint (src/seamless_communication/cli/m4t/evaluate/evaluate.py#L365) to update the model from your fine-tuned checkpoint.

Also, please take a look at the excellent note from Alisamar Husain about fine-tuning M4T models.

Thank you very much.

RRThivyan · 2024-09-21T12:39:51Z

Hi, I have finetuned the model using the notes from Alisamar, but the model is not able to be loaded, as its throwing error that some weights are missing. final_proj.weights missing. I modified the seamlessm4t_v2_large.yaml to my model checkpoint, but getting this error. does finetune models have different weights compared to original model?

amirmfarzane · 2024-09-22T09:42:04Z

Hi, I have finetuned the model using the notes from Alisamar, but the model is not able to be loaded, as its throwing error that some weights are missing. final_proj.weights missing. I modified the seamlessm4t_v2_large.yaml to my model checkpoint, but getting this error. does finetune models have different weights compared to original model?

If you're having trouble loading checkpoints saved after fine-tuning, you can use the load_checkpoint function in the mini-evaluation section of this notebook.

RRThivyan · 2024-09-23T05:05:41Z

Hi, I followed the steps you mentioned. But as I said, its throwing error at final_proj.weight. This is my query. does the finetuned model weights differ from original model? If so how can we use our finetuned model?

m4t_evaluate
--model_name seamlessM4T_large
--task ASR
--tgt_lang eng
--data_file /home/jupyter/myfiles/fleurs/test/test_manifest.json
--output_path eval
--n_samples 2000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why always Downloading the tokenizer of seamlessM4T_v2_large #409

Why always Downloading the tokenizer of seamlessM4T_v2_large #409

Longleaves commented Apr 9, 2024

zrthxn commented Apr 14, 2024 •

edited

Loading

amirmfarzane commented Jun 24, 2024

avidale commented Jun 24, 2024

amirmfarzane commented Jul 8, 2024 •

edited

Loading

RRThivyan commented Sep 21, 2024

amirmfarzane commented Sep 22, 2024

RRThivyan commented Sep 23, 2024 •

edited

Loading

Why always Downloading the tokenizer of seamlessM4T_v2_large #409

Why always Downloading the tokenizer of seamlessM4T_v2_large #409

Comments

Longleaves commented Apr 9, 2024

zrthxn commented Apr 14, 2024 • edited Loading

amirmfarzane commented Jun 24, 2024

avidale commented Jun 24, 2024

amirmfarzane commented Jul 8, 2024 • edited Loading

RRThivyan commented Sep 21, 2024

amirmfarzane commented Sep 22, 2024

RRThivyan commented Sep 23, 2024 • edited Loading

zrthxn commented Apr 14, 2024 •

edited

Loading

amirmfarzane commented Jul 8, 2024 •

edited

Loading

RRThivyan commented Sep 23, 2024 •

edited

Loading