Skip to content

[Request Impl] Apply Segment Serialization in Bundled Program #9771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Gasoonjia opened this issue Mar 31, 2025 · 7 comments
Open

[Request Impl] Apply Segment Serialization in Bundled Program #9771

Gasoonjia opened this issue Mar 31, 2025 · 7 comments
Assignees
Labels
good first issue Good for newcomers module: devtools Issues related to developer tools and code under devtools/

Comments

@Gasoonjia
Copy link
Contributor

Gasoonjia commented Mar 31, 2025

🚀 The feature, motivation and pitch

BundledProgram is a primary way for users to bundle representative input and (optional) its expected outputs with serialized ExecuTorch program (.pte file). It is a critical way to run representative input on device in a unified way.

However we realize the serialization is pretty slow nowadays. The key reason is we try to reserialize the pte file again into bundled program, which has been serialized and is unnecessary here.

To make the pte data a part of serialized bundled program, while not reserialize it, we can use the new introduced NameData to handle that, which makes inserialized data as separate data blobs append after serialized flabuffer data.

#8187 here's the design doc for NameDataBlob, and

def serialize_pte_binary(
is the code pointer for pte serialization, in which include the NameData serialization.

@Gasoonjia Gasoonjia added module: devtools Issues related to developer tools and code under devtools/ good first issue Good for newcomers labels Mar 31, 2025
@dillondesilva
Copy link
Contributor

This is a good improvement! Could I please take this?

@Gasoonjia
Copy link
Contributor Author

yeah sure that would be great help!

@Gasoonjia
Copy link
Contributor Author

hey @dillondesilva how's the issue going on so far?

This one is harder and more complex than other first issues. Please let me know if anything i can support

@dillondesilva
Copy link
Contributor

Thanks @Gasoonjia ! I've just been busy with some mid-semester exams but should have time to start this in about a week. Will let you know how it goes 👍

@Gasoonjia
Copy link
Contributor Author

Great thx for your hard work!

@dillondesilva
Copy link
Contributor

Hi @Gasoonjia - I had a couple things I wanted to check:

  1. To avoid reserializing the .pte file, can we remove adding program bytes to bp_schema.BundledProgram and should we instead replace it with NamedDataStoreOutput (or some other more correct NamedData related type)?
@dataclass
class BundledProgram:
    """ExecuTorch program bunlded with data for verification."""

    # Schema version.
    version: int

    # Test sets and other meta datas to verify the whole program.
    # Each BundledMethodTestSuite contains the test cases for one of the Method's
    # present inside the ExecuTorchProgram of the same BundledProgram. The method_name
    # present inside the BundledMethodTestSuite is what is used to link to the appropriate Method.
    method_test_suites: List[BundledMethodTestSuite]

    # Replace program with NamedDataStoreOutput
    named_data: NamedDataStoreOutput

    # The binary data of a serialized ExecuTorchProgram.
    # program: bytes
  1. Within the serialize_to_schema method, is the NamedDataStoreOutput to be accessed from self.executorch_program?

@metascroy
Copy link
Contributor

cc @Gasoonjia gentle ping here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers module: devtools Issues related to developer tools and code under devtools/
Projects
Development

No branches or pull requests

3 participants