Skip to content

[v2] Add support for chunking performance tests, and architectural changes #9485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: v2
Choose a base branch
from

Conversation

aemous
Copy link
Contributor

@aemous aemous commented May 9, 2025

Context:

The main purpose of this PR is adding 'chunking' support to performance tests to enable sharding of the tests across parallel workers in a distributed workflow. The secondary purpose of this PR is to refactor the architecture of the performance testing code to make it easier to integrate more complex performance tests. The architectural decisions were led by implementing more complex RPCv2 CBOR performance test cases and observing points of friction.

Description of changes:

  • Adds support for evenly chunking performance test cases and only running a specified chunk
  • Implemented a BaseBenchmarkSuite class which defines the abstract functions that must be implemented to define new performance tests. The default performance test cases were abstracted out into a JSONStubbedBenchmarkSuite implementation.
  • Modifications to JSONStubbedBenchmarkSuite
    • Modified dimensions format to require the usage of name and value keys to reflect the CloudWatch Metrics model.
    • Added support for writing binary content via a new mode key on file_literals definition.
  • Abstracted code for serializing benchmarks results into its own BenchmarkResultsSerializer class. This makes it easier to override this class for one-off performance testing when a different output format is expected.
  • Architectural changes to performance testing
    • Change Summarizer to return metric results in a new Metric data model, which includes additional data like description and unit.

Description of tests:

  • Ran the default JSON stubbed tests via ./scripts/performance/run-benchmarks --num-iterations 20 and observed expected results.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aemous aemous marked this pull request as ready for review June 4, 2025 20:37
@aemous aemous requested a review from a team June 4, 2025 20:38
Copy link
Member

@ashovlin ashovlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I saw the new /tests directory, I thought those would be unit/integration/etc. tests for the performance framework. I'd rather keep in the top level if it's still just implementation.

Comment on lines +260 to +262
- `mode` (string) **(optional)**: The write mode to use for writing the
file contents.
- Default: `w`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything actually using this? It looks hardcoded below in _create_file_with_size and begin_iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's intended to be used in the JSONStubbedBenchmarkSuite. In particular, it was useful for one-off performance testing of CBOR protocol for an operation with binary input. Will update in next revision

Comment on lines +77 to +83
if (
sum(
x is not None
for x in [parsed_args.num_chunks, parsed_args.chunk_id]
)
== 1
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pattern common? For readability summing bools seems odd.

@@ -0,0 +1,24 @@
class BaseBenchmarkSuite:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a base class if there's only one implementation of it?

If keeping, could you add class-level comments when when someone would use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having this base class so that it forces us to implement our only test suite via a contract that we know is extendable to other use cases. I previously extended this base class to implement CBOR benchmarks for a couple other services, and those tests fit nicely against the functions on this base class.

Even though we won't ship those CBOR tests, they served to show locally that the function contract on this class is extendible/reusable.

I will add those class-level comments in the next iteration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants