训练代码未报错，但是每个epoch中步数直接跳过 #1358

pengpengzi · 2024-02-05T09:54:51Z

pengpengzi
Feb 5, 2024

‘
(py37_audio2txt) [www@localhost FunASR-main]$ HYDRA_FULL_ERROR=1 python3 funasr/bin/train.py +model="/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/" +model_revision="v2.0.2" +train_data_set_list="data/list/audio_datasets.jsonl" +valid_data_set_list="data/list/audio_datasets.jsonl" +dataset_conf.batch_size=6000 +dataset_conf.num_workers=4 +train_conf.max_epoch=5 +output_dir="outputs" +device="cuda"
[2024-02-05 18:53:57,923][root][INFO] - download models from model hub: ms
{'model': 'ContextualParaformer', 'model_conf': {'ctc_weight': 0.0, 'lsm_weight': 0.1, 'length_normalized_loss': True, 'predictor_weight': 1.0, 'predictor_bias': 1, 'sampling_ratio': 0.75, 'inner_dim': 512}, 'encoder': 'SANMEncoder', 'encoder_conf': {'output_size': 512, 'attention_heads': 4, 'linear_units': 2048, 'num_blocks': 50, 'dropout_rate': 0.1, 'positional_dropout_rate': 0.1, 'attention_dropout_rate': 0.1, 'input_layer': 'pe', 'pos_enc_class': 'SinusoidalPositionEncoder', 'normalize_before': True, 'kernel_size': 11, 'sanm_shfit': 0, 'selfattention_layer_type': 'sanm'}, 'decoder': 'ContextualParaformerDecoder', 'decoder_conf': {'attention_heads': 4, 'linear_units': 2048, 'num_blocks': 16, 'dropout_rate': 0.1, 'positional_dropout_rate': 0.1, 'self_attention_dropout_rate': 0.1, 'src_attention_dropout_rate': 0.1, 'att_layer_num': 16, 'kernel_size': 11, 'sanm_shfit': 0}, 'predictor': 'CifPredictorV2', 'predictor_conf': {'idim': 512, 'threshold': 1.0, 'l_order': 1, 'r_order': 1, 'tail_threshold': 0.45}, 'frontend': 'WavFrontend', 'frontend_conf': {'fs': 16000, 'window': 'hamming', 'n_mels': 80, 'frame_length': 25, 'frame_shift': 10, 'lfr_m': 7, 'lfr_n': 6, 'cmvn_file': '/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/am.mvn'}, 'specaug': 'SpecAugLFR', 'specaug_conf': {'apply_time_warp': False, 'time_warp_window': 5, 'time_warp_mode': 'bicubic', 'apply_freq_mask': True, 'freq_mask_width_range': [0, 30], 'lfr_rate': 6, 'num_freq_mask': 1, 'apply_time_mask': True, 'time_mask_width_range': [0, 12], 'num_time_mask': 1}, 'train_conf': {'accum_grad': 1, 'grad_clip': 5, 'max_epoch': 5, 'val_scheduler_criterion': ['valid', 'acc'], 'best_model_criterion': [['valid', 'acc', 'max']], 'keep_nbest_models': 10, 'log_interval': 50}, 'optim': 'adam', 'optim_conf': {'lr': 0.0005}, 'scheduler': 'warmuplr', 'scheduler_conf': {'warmup_steps': 30000}, 'dataset': 'AudioDataset', 'dataset_conf': {'index_ds': 'IndexDSJsonl', 'batch_sampler': 'DynamicBatchLocalShuffleSampler', 'batch_type': 'example', 'batch_size': 6000, 'max_token_length': 2048, 'buffer_size': 500, 'shuffle': True, 'num_workers': 4}, 'tokenizer': 'CharTokenizer', 'tokenizer_conf': {'unk_symbol': '', 'split_with_space': True, 'token_list': '/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/tokens.json'}, 'ctc_conf': {'dropout_rate': 0.0, 'ctc_type': 'builtin', 'reduce': True, 'ignore_nan_grad': True}, 'normalize': None, 'init_param': '/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/model.pt', 'config': '/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/config.yaml', 'is_training': True, 'model_revision': 'v2.0.2', 'train_data_set_list': 'data/list/audio_datasets.jsonl', 'valid_data_set_list': 'data/list/audio_datasets.jsonl', 'output_dir': 'outputs', 'device': 'cuda', 'model_path': '/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/'}

tables:

[2024-02-05 18:53:57,974][root][INFO] - config.yaml is saved to: outputs/config.yaml
[2024-02-05 18:54:00,038][root][INFO] - init_param is not None: ('/home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/model.pt',)
[2024-02-05 18:54:00,038][root][INFO] - Loading pretrained params from /home/www/FunASR-main/model_zoo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/model.pt
[2024-02-05 18:54:02,938][root][WARNING] - distributed is not initialized, only single shard
[2024-02-05 18:54:02,938][root][INFO] - in rank: 0, num of samplers: 2, total_num of samplers across ranks: 2
[2024-02-05 18:54:02,938][root][WARNING] - distributed is not initialized, only single shard
[2024-02-05 18:54:02,938][root][INFO] - in rank: 0, num of samplers: 2, total_num of samplers across ranks: 2
/home/www/.conda/envs/py37_audio2txt/lib/python3.7/site-packages/torch/utils/data/dataloader.py:566: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
[2024-02-05 18:54:02,939][root][WARNING] - distributed is not initialized, only single shard
No checkpoint found at 'outputs/model.pt', starting from scratch
Training Epoch: 1: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 1: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep0
/home/www/.conda/envs/py37_audio2txt/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Training Epoch: 2: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 2: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep1
Training Epoch: 3: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 3: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep2
Training Epoch: 4: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 4: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep3
Training Epoch: 5: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 5: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep4
Training Epoch: 6: 0%| | 0/1 [00:00<?, ?it/s]
Training Epoch: 6: 0%| | 0/1 [00:00<?, ?it/s]
Checkpoint saved to outputs/model.pt.ep5

’这是我的全部日志，为什么会出现 “ 0/1 [00:00<?, ?it/s]”这种情况

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练代码未报错，但是每个epoch中步数直接跳过 #1358

{{title}}

Replies: 0 comments

Select a reply

训练代码未报错，但是每个epoch中步数直接跳过 #1358

pengpengzi Feb 5, 2024

Replies: 0 comments

pengpengzi
Feb 5, 2024