You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug in CutPairsSampler with CutSet.from_files(): Fails to raise StopIteration at the end of dataset iteration, raises AttributeError: 'tuple' object has no attribute 'subset'#1396
Open
Aijohc opened this issue
Sep 20, 2024
· 3 comments
Hello, and thank you for the excellent work with Lhotse's data management features!
I encountered a bug when using CutPairsSampler. When I load my source_cuts and target_cuts using CutSet.from_files() (with a list of .jsonl.gz files), the expected StopIteration exception is not raised correctly at the end of the dataset iteration. Instead, I encounter a different error:
deftrain_cuts(self) ->Tuple[CutSet]:
logging.info("About to get train cuts")
prompt_files=list(
sorted(self.args.manifest_dir.glob("train/*.cuts.prompts.jsonl.gz"))
)
target_files=list(
sorted(self.args.manifest_dir.glob("train/*.cuts.targets.jsonl.gz"))
)
prompts=CutSet.from_files(prompt_files+target_files, shuffle_iters=False)
targets=CutSet.from_files(target_files+prompt_files, shuffle_iters=False)
returnprompts, targets
lhotse version: 1.26.0
I believe this could be an issue with how the end of the dataset is handled when iterating over CutPairsSampler. Could you please investigate this?
Thanks again for your hard work!
Additional Question:
I also have a question regarding the CutPairsSampler. Is it possible to specify parameters like buffer_size and quadratic_duration similar to the DynamicBucketingSampler? These parameters are very important when working with the DynamicBucketingSampler, and I noticed they are not directly available in CutPairsSampler. Could you consider supporting such parameters?
Thank you!
The text was updated successfully, but these errors were encountered:
Aijohc
changed the title
Bug in CutPairsSampler with CutSet.from_files(): Fails to raise StopIteration at the end of dataset iteration
Bug in CutPairsSampler with CutSet.from_files(): Fails to raise StopIteration at the end of dataset iteration, raises AttributeError: 'tuple' object has no attribute 'subset'Sep 20, 2024
Regarding the first issue it looks like I haven't updated CutPairsSampler properly with latest changes. I'll take a look.
Regarding the other question, You might want to use DynamicCutSampler or DynamicBucketingSampler instead; if you give them more than one CutSet, they act as CutPairsSampler (and support triples, quadruples, and so on as well). In fact CutPairsSampler should be deprecated at this point.
Hello, and thank you for the excellent work with Lhotse's data management features!
I encountered a bug when using
CutPairsSampler
. When I load my source_cuts and target_cuts usingCutSet.from_files()
(with a list of.jsonl.gz
files), the expectedStopIteration
exception is not raised correctly at the end of the dataset iteration. Instead, I encounter a different error:CutParisSampler
Cuts
lhotse version: 1.26.0
I believe this could be an issue with how the end of the dataset is handled when iterating over
CutPairsSampler
. Could you please investigate this?Thanks again for your hard work!
Additional Question:
I also have a question regarding the
CutPairsSampler
. Is it possible to specify parameters likebuffer_size
andquadratic_duration
similar to theDynamicBucketingSampler
? These parameters are very important when working with theDynamicBucketingSampler
, and I noticed they are not directly available inCutPairsSampler
. Could you consider supporting such parameters?Thank you!
The text was updated successfully, but these errors were encountered: