Releases: lhotse-speech/lhotse
v1.27.0 - Crispy Momo
New recipes
- [Recipe] Wenetspeech4tts by @yuekaizhang in #1384
- [Recipe] Spatial LibriSpeech by @JinZr in #1386
Other enhancements
- Cap the 'trng' random seeds to 2**31 avoiding numpy error by @pzelasko in #1379
CutSet
.prefetch() for background cuts loading during iteration by @pzelasko in #1380- Include a copyright NOTICE listing major copyright holders by @pzelasko in #1381
- Added has_custom to MixedCut by @anteju in #1383
- Fix to fixed batch size bucketing and audio loading network connectio… by @pzelasko in #1387
New Contributors
Full Changelog: v1.26.0...v1.27.0
v1.26.0 - Uranium Fever
v1.25.0 - Himalayan Cat
What's Changed
- [feature] Add
.narrowband()
effect (mulaw, lpc10 codecs) by @rouseabout in #1348 - [feature/optimization] Support for pre-determined batch sizes in
DynamicBucketingSampler
by @pzelasko in #1372 - [bug] Fix
MixedCut
transforms serialization by @pzelasko in #1370
Full Changelog: v1.24.2...v1.25.0
v1.24.2
New recipes
New features
Several new APIs for manifest classes added in #1361:
cut.iter_data()
which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g.,("recording", Recording(...)), ("custom_features", TemporalArray(...))
)is_in_memory
property for all manifest types to indicate if it contains data that is held in memoryis_placeholder
for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)cut.drop_in_memory_data()
which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)
Bug fixes
- Restoring smart open for local files if available by @pzelasko in #1360
- Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
- Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
- Numpy 2.0 compatibility by @pzelasko in #1362
New Contributors
Full Changelog: v1.24.1...v1.24.2
v1.24.1
v1.24 - The World's Highest Wingsuit Jump
What's Changed
New features
Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets
and is enabled by default.
- Dynamic bucket selection RNG sync by @pzelasko in #1341
- Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir
: support Cut input and in memory data by @pzelasko in #1332
Recipes
Other improvements
- Missing 'subset' parameter by @daniel-dona in #1336
- Fix describe on cuts by @keeofkoo in #1340
- Use libsndfile in recording chunk dataset by @pzelasko in #1335
- Fix librispeech manifest caching by @haerski in #1343
- Fix one-off edge case in split_lazy by @pzelasko in #1347
- Increase the start diff tolerance for feature loading by @pzelasko in #1349
- More test coverage for lhotse subset by @pzelasko in #1345
New Contributors
- @keeofkoo made their first contribution in #1340
- @haerski made their first contribution in #1343
- @Triplecq made their first contribution in #1330
Full Changelog: v1.23...v1.24
v1.23 - Snowdrop
What's Changed
Recipes
- MDCC recipe by @JinZr in #1302
- Updated text_norm for
aishell
recipe by @JinZr in #1305 - Allow skipping missing files in AMI download by @pzelasko in #1318
- Add Chinese TTS dataset
baker
. by @csukuangfj in #1304 - In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328
Fixes to a regression in noise mixing augmentations
- Enhance
CutSet.mix()
randomness and data utilization by @pzelasko in #1315 - Fix randomness in CutMix transform by @pzelasko in #1316
- select a random sub-region of the noise based on the delta duration by @osadj in #1317
Other improvements
- Add dataset for audio tagging by @marcoyang1998 in #1241
- Fix _get_strided_batch device by @lifeiteng in #1303
- Fix typo in README.md by @yfyeung in #1308
- Fix export of features/array to shar by @pzelasko in #1323
- Fix
trim_to_supervision_groups
by @pzelasko in #1322
New Contributors
- @daniel-dona made their first contribution in #1328
Full Changelog: v1.22...v1.23
v1.22 - Sherpa's Paradise
What's Changed
New features
As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints
- Multi-channel support improvements
Lhotse MultiCut
s:
- are now exportable into Lhotse Shar format
- gained a new method
cut = cut.with_channels([0, 1, ...])
to modify the channels they refer to - can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining
cut.target_recording
, audio can be read viacut.load_target_recording()
and channels will be auto-selected by looking upcut.target_recording_channel_selector
).
Recipes
- Add new recipe: speechio by @yuekaizhang in #1297
- tedlium2 recipe by @JinZr in #1296
Other improvements
- Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
- Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
- Cutconcat fixed max duration by @swigls in #1292
- Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
- fix whisper for multi-channel data by @yuekaizhang in #1289
- Xfail flaky SileroVAD tests by @pzelasko in #1300
New Contributors
Full Changelog: v1.21...v1.22
v1.21 - Glaciology
What's Changed
This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile
is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND
is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.
- Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
- Fixes for manifest validation and fixing by @pzelasko in #1284
- Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend
specificsave_audio
andinfo
, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, usinglibsndfile
as preferred audio backend by @pzelasko in #1288
Full Changelog: v1.20...v1.21
v1.20 - Pining for the Fjords
What's Changed
New features
- Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
- Ensure
drop_last=False
always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277 - Improved CPU memory usage and shuffling + bucketing in
DynamicBucketingSampler
by @pzelasko in #1276 - Enable seed randomization in dynamic samplers by @pzelasko in #1278
Recipes
- Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in #1272
Other improvements
- Update docs with env vars used by Lhotse by @pzelasko in #1252
- support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
- Fix non-deterministic tests by @pzelasko in #1261
- Fix duplication issues in CutSet.mix() by @pzelasko in #1268
- Support controllable
CutSet.mux
weights in multiprocess dataloading by @pzelasko in #1266 - Fix distributed sampler initialization and
exceeded
sampler warning false positives by @pzelasko in #1270 - Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
- Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279
New Contributors
- @HSTEHSTEHSTE made their first contribution in #1272
Full Changelog: v1.19...v1.20