Skip to content

LazyLLMDataset不支持binpacking,是有什么特殊限制吗? #2846

Open
@kehuanfeng

Description

@kehuanfeng

看到LazyLLMDataset没有加packing的支持,请问是有什么考虑吗?

        if args.lazy_tokenize:
            train_dataset = LazyLLMDataset(
                train_dataset, template.encode, strict=args.strict, random_state=args.data_seed)
            if val_dataset is not None and not args.predict_with_generate:
                val_dataset = LazyLLMDataset(
                    val_dataset, template.encode, strict=args.strict, random_state=args.data_seed)
        else:
            preprocessor_cls = PackingPreprocessor if args.packing else EncodePreprocessor
            preprocessor = preprocessor_cls(template=template)
            train_dataset = preprocessor(train_dataset, num_proc=args.dataset_num_proc, strict=args.strict)
            if val_dataset is not None and not args.predict_with_generate:
                val_dataset = preprocessor(val_dataset, num_proc=args.dataset_num_proc, strict=args.strict)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions