Releases: NVIDIA/NeMo
NVIDIA Neural Modules 1.6.1
NVIDIA Neural Modules 1.6.0
ASR
- Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
- Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
- Move vocabs from asr to common by @Oktai15 :: PR: #3084
- Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
- adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
- Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
- Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
- Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updates on ASR with diarization util files by @tango4j :: PR: #3359
- Asr fr by @tbartley94 :: PR: #3404
- Refactor ASR Examples Directory by @titu1994 :: PR: #3392
- Asr patches by @titu1994 :: PR: #3443
- Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487
TTS
- MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
- Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
- Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
- Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
- Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
- Minor Updates to TTS Finetuning by @blisc :: PR: #3455
NLP / NMT
- NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
- Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
- NMT checkpoint averaging by @michalivne :: PR: #3096
- NMT validation examples with inputs by @michalivne :: PR: #3194
- Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- NLP text augmentation by @michalivne :: PR: #3291
- Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
- Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
- Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- NMT Shared Embeddings Weights by @michalivne :: PR: #3340
- Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
- Byte-level Multilingual NMT by @aklife97 :: PR: #3368
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259
Text Normalization / Inverse Text Normalization
- Tn clean upsample by @yzhang123 :: PR: #3024
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Update english tn ckpt by @yzhang123 :: PR: #3143
- WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
- German TN wfst by @yzhang123 :: PR: #3174
- Add ITN Vietnamese by @binh234 :: PR: #3217
- WFST TN updates by @ekmb :: PR: #3235
- Itn german refactor by @yzhang123 :: PR: #3262
- Tn german deterministic by @yzhang123 :: PR: #3308
- TN updates by @ekmb :: PR: #3285
- Added double digits to EN ITN by @yzhang123 :: PR: #3321
- TN_non_deterministic optimized by @ekmb :: PR: #3343
- Missing init for TN German by @ekmb :: PR: #3355
- Ru TN by @ekmb :: PR: #3390
- Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440
NeMo Tools
- CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
- Updated NumPy SDE requirement by @vsl9 :: PR: #3442
Export
- ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Documentation
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- Tn add nn wfst and doc by @yzhang123 :: PR: #3135
- Add apex into by @PeganovAnton :: PR: #3214
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
- Doc link fixes by @nithinraok :: PR: #3264
- French ASR Doc updates by @tbartley94 :: PR: #3322
- german asr doc page update by @yzhang123 :: PR: #3325
- update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
- Asr fr by @tbartley94 :: PR: #3404
- Update copyright to 2022 by @ericharper :: PR: #3426
- Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
- Update speaker diarization docs by @tango4j :: PR: #3419
- NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
- Add verification helper function and update docs by @nithinraok :: PR: #3514
- Prompt tuning documentation by @vadam5 :: PR: #3541
- French ASR Doc updates by @tbartley94 :: PR: #3322
- German asr doc page update by @yzhang123 :: PR: #3325
Bugfixes
- Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
- Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
- Fix README by @ericharper :: PR: #3070
- Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
- Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
- Attribute is not working in . by @PeganovAnton :: PR: #3099
- Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
- A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
- Fixed two typos by @bene-ges :: PR: #3157
- Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
- LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
- Add apex into by @PeganovAnton :: PR: #3214
- Patch omegaconf for cfg by @fayejf :: PR: #3224
- Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
- CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
- Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
- Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
- Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
- Doc link fixes by @nithinraok :: PR: #3264
- Escape chars fix by @ekmb :: PR: #3253
- Fix asr output - eval mode by @nithinraok :: PR: #3274
- Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
- Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
- Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
- Tn en money fix by @yzhang123 :: PR: #3290
- Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
- Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
- Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
- Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
- Fix bucketing list bug. by @VahidooX :: PR: #3315
- Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
- Fix german and vietnames grammar by @yzhang123 :: PR: #3331
- Fix readme to show cmd by @yzhang123 :: PR: #3345
- Fix speaker label models training convergence by @nithinraok :: PR: #3354
- Tqdm get datasets by @bmwshop :: PR: #3358
- Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
- Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
- Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
- Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
- fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
- TalkNet Fix by @stasbel :: PR: #3092
- Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
- Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
- Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
- Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
- NMT MIM mean variance fix by @michalivne :: PR: #3385
- Fix bug for missing variable by @MaximumEntropy :: PR: #3437
- Asr patches by @titu1994 :: PR: #3443
- Prompt tuning loss mask fix by @vadam5 :: PR: #3438
- BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
- Fix hysterisis loading by @MaximumEntropy :: PR: #3460
- Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
- Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
- WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
- Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
- file name fix - Segmentation tutorial by @ekmb :: PR: #3474
- Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
- Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
- Fix description by @PeganovAnton :: PR: #3482
- typo fix in diarization notebooks by @nithinraok :: PR: #3480
- Fix check...
NVIDIA Neural Modules 1.5.1
Features
Known Issues
- Training of speaker models converge very slowly due to a bug (fixed in main: #3354)
- ASR training does not reach adequate WER due to bug in Numba Spec Augment (fixed in main : #3299). For details refer to #3288 (comment) . For a temporary workaround, disable Numba Spec Augment with https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/audio_preprocessing.py#L471 set to False in the config for SpecAugment in the yaml config. The fix will be part of 1.6.0.
NVIDIA Neural Modules 1.5.0
Features
- Megatron GPT pre-training with tensor model parallelism #2975
- NMT encoder and decoder with different hidden size #2856
- Logging timing of train/val/test steps #2936
- Logging NMT encoder and decoder timing #2956
- Logging timing per sentence length and tokenized text statistics #3004
- Upgrade to PyTorch Lightning 1.5.0, bfloat support #2975
- French Inverse Text Normalization #2921
- Bucketing of tarred datasets for ASR models #2999
- ASR with diarization #3007
- Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node #3017
Documentation Updates
- RNNT
Contributors
@ericharper @michalivne @MaximumEntropy @VahidooX @titu1994 @blisc @okuchaiev @tango4j @erastorgueva-nv @fayejf @vadam5 @ekmb @yaoyu-33 @nithinraok @erhoo82 @tbartley94 @PeganovAnton @madhukarkm @yzhang123
(Please let us know if you have contributed to this release and we have missed you here.)
NVIDIA Neural Modules 1.4.0
Features
- Improved speaker clustering #2729
- Upgrade to NVIDIA PyTorch 21.08 container #2799
- RNNT mAES beam search support #2802
- Transfer learning for new speakers #2684
- Simplify speaker scripts #2777
- Perceiver-encoder architecture #2737
- Relative paths in tarred datasets #2776
- Torch only TTS package #2643
- Inverse text normalization for Spanish #2489
Tutorial Notebooks
- Duration and pitch control for TTS # 2700
Bug fixes
Contributors
@tango4j @titu1994 @paarthneekhara @nithinraok @michalivne @erastorgueva-nv @borisfom @blisc
(some contributors may not be listed explicitly)
NVIDIA Neural Modules 1.3.0
Added
- RNNT Exportable to ONNX #2510
- Multi-batch inference support for speaker diarization #2522
- DALI Integration for char/subword ASR #2567
- VAD Postprocessing #2636
- Perceiver encoder for NMT #2621
- gRPC NMT server #2656
- German ITN # 2486
- Russian TN and ITN #2519
- Save/restore connector # 2592
- PTL 1.4+ # 2600
Tutorial Notebooks
Bug Fixes
- NMESE clustering for very small audio files #2566
Contributors
@pasandi20 @ekmb @nithinraok @titu1994 @ryanleary @yzhang123 @ericharper @michalivne @MaximumEntropy @fayejf
(some contributors may not be listed explicitly)
NVIDIA Neural Modules 1.2.0
Added
- Improve performance of speak clustering (#2445)
- Update Conformer for ONNX conversion (#2439)
- Mean and length normalization for better embeddings speaker verification and diarization (#2397)
- FastEmit RNNT Loss Numba for reducing latency (#2374)
- Multiple datasets, right to left models, noisy channel re-ranking, ensembling for NMT (#2379)
- Byte level tokenization (#2365)
- Bottleneck with attention bridge for more efficient NMT training (#2390)
- Tutorial notebook for NMT data cleaning and preprocessing (#2467)
- Streaming Conformer inference script for long audio files (#2373)
- Res2Net Ecapa equivalent implementation for speaker verification and diarization (#2468)
- Update end-to-end tutorial notebook to use CitriNet (#2457)
Contributors
@nithinraok @tango4j @jbalam-nv @titu1994 @MaximumEntropy @mchrzanowski @michalivne @jbalam-nv @fayejf @okuchaiev
(some contributors may not be listed explicitly)
Known Issues
import nemo.collections.nlp as nemo_nlp
will result in an error. This will be patched in the upcoming version. Please try to import the individual files as a work-around.
NVIDIA Neural Modules 1.1.0
NeMo 1.1.0 release is our first release in our new monthly release cadence. Monthly releases will focus on adding new features that enable new NeMo Models or improve existing ones.
Added
- Pretrained Megatron-LM encoders (including model parallel) for NMT (#2238)
- RNNT Numba loss (#1995)
- Enable multiple models to be restored (#2245)
- Audio based text normalization (#2285)
- Multilingual NMT (#2160)
- FastPitch export (#2355)
- ASR fine-tuning tutorial for other languages (#2346)
Bugfixes
Documentation
- ONNX export documentation (#2330
Contributors
@borisfom @MaximumEntropy @ericharper @aklife97 @titu1994 @ekmb @yzhang123 @blisc
(some contributors may not be listed explicitly)
NVIDIA Neural Modules 1.0.2
Release 1.0.2
NeMo 1.0.2 is a minor change over 1.0.0 adding version checks for Hydra dependency.
NVIDIA Neural Modules 1.0.1
Release 1.0.1
NeMo 1.0.1 is a minor change over 1.0.0 adding proper version bounds for some external dependencies.