Open
Description
看到LazyLLMDataset没有加packing的支持,请问是有什么考虑吗?
if args.lazy_tokenize:
train_dataset = LazyLLMDataset(
train_dataset, template.encode, strict=args.strict, random_state=args.data_seed)
if val_dataset is not None and not args.predict_with_generate:
val_dataset = LazyLLMDataset(
val_dataset, template.encode, strict=args.strict, random_state=args.data_seed)
else:
preprocessor_cls = PackingPreprocessor if args.packing else EncodePreprocessor
preprocessor = preprocessor_cls(template=template)
train_dataset = preprocessor(train_dataset, num_proc=args.dataset_num_proc, strict=args.strict)
if val_dataset is not None and not args.predict_with_generate:
val_dataset = preprocessor(val_dataset, num_proc=args.dataset_num_proc, strict=args.strict)
Metadata
Metadata
Assignees
Labels
No labels