[RND-535] Technical design for write performance benchmarking (#256)

* [RND-535] Restructuring documents * [RND-535] Creating script and instructions to execute bulk load performance * [RND-535] Adding information for bulk load to test upsert performance * [RND-535] Adding indications to run pipeclean and volume performance tests * [RND-535] Adding notes to run performance test suite * Fixing link to tickets * [RND-535] Changes after review on setup performance suite * [RND-535] Adjustments to write performance doc * Removing assessment role information in documentation * Load Bulk Load information * Rename folder * Removing duplicated folder
Ed-Fi-Exchange-OSS · Jun 19, 2023 · 821ba85 · 821ba85
1 parent eb99965
commit 821ba85
Show file tree

Hide file tree

Showing 13 changed files with 147 additions and 58 deletions.
diff --git a/docs/design/README.md b/docs/design/README.md
@@ -3,5 +3,5 @@
 ## Contents
 
 * [Support for Offline Cascading Updates](offline-cascading-updates/)
-* [Read Performance Benchmarking](read-performance-benchmarking/)
-* [Open Telemetry](open-telemetry/README.md)
+* [Performance Benchmarking](performance-benchmarking/)
+* [Open Telemetry](open-telemetry/)
diff --git a/docs/Design/open-telemetry/README.md → docs/design/open-telemetry/README.md b/docs/Design/open-telemetry/README.md → docs/design/open-telemetry/README.md
diff --git a/...n/read-performance-benchmarking/README.md → ...formance-benchmarking/READ-PERFORMANCE.md b/...n/read-performance-benchmarking/README.md → ...formance-benchmarking/READ-PERFORMANCE.md
@@ -7,11 +7,7 @@ To do the performance tests to retrieve endpoints, use the
 
 ### Setup
 
-- Configure Meadowlark and verify that it's running.
-- Load Sample data with the Invoke-LoadGrandBend or Invoke-LoadPartialGrandBend scripts.
-- Verify that the data has been loaded correctly through the API or from the database.
-- Clone the [Suite3-Performance-Testing](https://github.com/Ed-Fi-Exchange-OSS/Suite-3-Performance-Testing) repository.
-- Install [python](https://www.python.org/downloads/) and [poetry](https://python-poetry.org/docs/#installation).
+- Follow steps to [setup performance suite](./SETUP-PERFORMANCE-SUITE.md).
 - Go to /src/edfi_paging_test folder and run `poetry install`.
 - Create a .env file based on /src/edfi_paging_test/edfi_paging_test/.env.example with your endpoint, key and secret.
 
@@ -24,14 +20,16 @@ To do the performance tests to retrieve endpoints, use the
 ### Comparing performance of multiple runs
 
 To get a detailed comparison of Mean time and Standard Deviation, run the script
-[GetAll-Performance.ps1](../../../eng/performance/GetAll-Performance.ps1). This will print the details of the execution,
-additionally, it will generate a report per each run executed in CSV format, that can be analyzed.
+[GetAll-Performance.ps1](../../../eng/performance/GetAll-Performance.ps1).
 
 The script receives two parameters:
 
-PagingTestsPath: Path of the Suite3-Performance-Testing edfi-paging-tests location.
+**PagingTestsPath**: Path of the Suite3-Performance-Testing edfi-paging-tests location.
+
+**NumTrials**: Number of times to run the tests. Defaults to *5*.
 
-NumTrials: Number of times to run the tests. Defaults to *5*.
+This will print the details of the execution,
+additionally, it will generate a report per each run executed in CSV format, that can be analyzed.
 
 Example
 
@@ -76,49 +74,3 @@ Results example:
 
 Run the same tests against and ODS/API instance with the same data set and filtering out the tpdm, sample and homograph
 resources since those are not handled by Meadowlark. Changing the url and variables in the .env file inside edfi_paging_test.
-
-### Profiler
-
-<details>
-  <summary>Running MongoDB profiler</summary>
-
-MongoDB comes with a built in profiler, disabled by default.
-
-To enable, connect to the docker container with `mongosh` and execute `db.setProfilingLevel(2)` to track all traffic.
-
-This must be done before running the paging tests to track the next instructions. To see the latest tracked data, run `show
-profile`.
-
-This will display something similar to:
-
-```json
-query   meadowlark.documents 1ms Wed Jun 07 2023 15:20:33
-command:{
-  find: 'documents',
-  filter: {
-    aliasIds: {
-      '$in': [
-        'KcsqHWHlSrAHP0LyDuChFK-C3NuO_tH5NF2YRA',
-        'auET2M3A7eg92ChrMaFL6vkmjHtx83fCs3kt_w',
-        'h0E08by8zxQHVXAblfHfXX4gU4l2-0AKcLWbGA'
-      ]
-    }
-  },
-  projection: { _id: 1 },
-  txnNumber: Long("754"),
-  autocommit: false,
-  '$clusterTime': {
-    clusterTime: Timestamp({ t: 1686172829, i: 1 }),
-    signature: {
-      hash: "",
-      keyId: Long("7241292544405929986")
-    }
-  },
-  '$db': 'meadowlark'
-} keysExamined:5 docsExamined:2 cursorExhausted numYield:0 nreturned:2 locks:{} storage:{} responseLength:346 protocol:op_msg
-```
-
-From the results, you can analyze the timeStamp and the number of docs and keys examined to get the results. [Read
-more](https://www.mongodb.com/docs/manual/reference/database-profiler/).
-
-</details>
diff --git a/docs/design/performance-benchmarking/README.md b/docs/design/performance-benchmarking/README.md
@@ -0,0 +1,6 @@
+# Performance Benchmarking
+
+## Contents
+
+* [Write Performance Benchmarking](./WRITE-PERFORMANCE.md)
+* [Read Performance Benchmarking](./READ-PERFORMANCE.md)
diff --git a/docs/design/performance-benchmarking/SETUP-PERFORMANCE-SUITE.md b/docs/design/performance-benchmarking/SETUP-PERFORMANCE-SUITE.md
@@ -0,0 +1,31 @@
+# Setup Performance Testing Suite
+
+The performance testing suite was developed to support performance tuning of the
+ODS/API version 3.0 and newer. The ODS Platform team provides primary support
+for the tools in this test suite.
+
+Types of testing that can run with these tools:
+
+- Volume test: 30 minutes of Create, Update, and Delete requests distributed
+  across many different resources. Goal: measure throughput.
+- Load test: longer run of the volume tests, with a higher number of clients.
+  Goal: push to breaking point.
+- Soak test: 24 hours of volume tests. Goal: measure throughput over a long
+  period of time to detect if there is degradation over time.
+- Pipeclean: Execute all API calls once with a single client (“user”), so that I
+  know all test cases are functional and system components are running properly.
+- Get All Paging: retrieve all records from all (configured) resources.
+
+Steps to run against meadowlark:
+
+- Configure Meadowlark and verify that it's running.
+- Load Sample data with the Invoke-LoadGrandBend or Invoke-LoadPartialGrandBend
+  scripts (otherwise, the tests will fail without some expected descriptors and
+  education organization Ids).
+- Verify that the data has been loaded correctly through the API or from the
+  database.
+- Clone the
+  [Suite3-Performance-Testing](https://github.com/Ed-Fi-Exchange-OSS/Suite-3-Performance-Testing)
+  repository.
+- Install [python](https://www.python.org/downloads/) and
+  [poetry](https://python-poetry.org/docs/#installation).
diff --git a/docs/design/performance-benchmarking/WRITE-PERFORMANCE.md b/docs/design/performance-benchmarking/WRITE-PERFORMANCE.md
@@ -0,0 +1,69 @@
+# Write Performance Benchmarking
+
+## Running Write Performance for Meadowlark
+
+There are two ways to analyze write performance for Meadowlark:
+
+1. Bulk Loading.
+2. Run Performance Testing Suite.
+
+## Bulk Loading
+
+> **Note** To run the bulk loading tests it's important to start with a clean
+> database.
+
+To load the data, there are functions to load the GrandBend and PartialGrandBend
+data sets into Meadowlark.
+
+To measure the execution time, run the script
+[Bulk-LoadPerformance.ps1](../../../eng/performance/BulkLoad-Performance.ps1).
+
+The script receives two parameters:
+
+- The **Template** that you desire to run (defaults to GrandBend).
+- The **Update** flag specifies if you desire to measure the creation or the
+  update of the resources (defaults to false).
+
+This script will enter the data into Meadowlark and will print the execution
+time.
+
+> **Warning** The LoadGrandBend script is returning errors when processing some data ([RND-586](https://tracker.ed-fi.org/browse/RND-586)).
+
+List of issues:
+
+- [RND-583](https://tracker.ed-fi.org/browse/RND-583)
+
+## Performance Testing Suite
+
+There are two performance test types that paths that cover the write
+performance, Pipeclean Tests and Volume Tests.
+
+### Setup
+
+> **Warning**
+> _For now_: The tests will be executed with
+> [Suite3-Performance-Testing](https://github.com/Ed-Fi-Exchange-OSS/Suite-3-Performance-Testing)
+> on branch [meadowlark-updates](https://github.com/Ed-Fi-Exchange-OSS/Suite-3-Performance-Testing/tree/meadowlark-updates)
+> to have the changes required for Meadowlark. Vinaya and/or StephenF can verify when to remove
+> this temporary note and use the `main` branch again.
+
+- Follow steps to [setup performance suite](./SETUP-PERFORMANCE-SUITE.md).
+- Go to /src/edfi_performance_test folder and run `poetry install`.
+- Create a user in meadowlark with the role of `vendor`. Save key and secret.
+- Create a .env file based on
+  /src/edfi_paging_test/edfi_paging_test/.env.example with your endpoint, and the previously created key
+  and secret. Set the values required for Meadowlark.
+- Run `poetry run python edfi_performance_test -t "VALUE"` where value can be
+  "pipeclean" or "volume". [More details](https://github.com/Ed-Fi-Exchange-OSS/Suite-3-Performance-Testing/tree/main/src/edfi-performance-test)
+
+> **Warning** Currently there are a list of problems to run the performance tests,
+> therefore the process will not finish and you have to manually terminate the execution.
+
+List of issues:
+
+- [RND-583](https://tracker.ed-fi.org/browse/RND-583)
+- [RND-584](https://tracker.ed-fi.org/browse/RND-584)
+- [RND-585](https://tracker.ed-fi.org/browse/RND-585)
+- [PERF-298](https://tracker.ed-fi.org/browse/PERF-298)
+
+After fixing this errors, the tests should be executed as part of [RND-580](https://tracker.ed-fi.org/browse/RND-580)
diff --git a/eng/Invoke-LoadGrandBend.ps1 → eng/bulkLoad/Invoke-LoadGrandBend.ps1 b/eng/Invoke-LoadGrandBend.ps1 → eng/bulkLoad/Invoke-LoadGrandBend.ps1
diff --git a/eng/Invoke-LoadPartialGrandBend.ps1 → eng/bulkLoad/Invoke-LoadPartialGrandBend.ps1 b/eng/Invoke-LoadPartialGrandBend.ps1 → eng/bulkLoad/Invoke-LoadPartialGrandBend.ps1
diff --git a/eng/modules/BulkLoad.psm1 → eng/bulkLoad/modules/BulkLoad.psm1 b/eng/modules/BulkLoad.psm1 → eng/bulkLoad/modules/BulkLoad.psm1
diff --git a/eng/modules/Get-XSD.psm1 → eng/bulkLoad/modules/Get-XSD.psm1 b/eng/modules/Get-XSD.psm1 → eng/bulkLoad/modules/Get-XSD.psm1
diff --git a/eng/modules/Package-Management.psm1 → eng/bulkLoad/modules/Package-Management.psm1 b/eng/modules/Package-Management.psm1 → eng/bulkLoad/modules/Package-Management.psm1
diff --git a/eng/performance/BulkLoad-Performance.ps1 b/eng/performance/BulkLoad-Performance.ps1
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: Apache-2.0
+# Licensed to the Ed-Fi Alliance under one or more agreements.
+# The Ed-Fi Alliance licenses this file to you under the Apache License, Version 2.0.
+# See the LICENSE and NOTICES files in the project root for more information.
+<#
+.DESCRIPTION
+    Measure Bulk Load Performance
+#>
+param(
+    [ValidateSet('GrandBend', 'PartialGrandBend')]
+    $Template = "GrandBend",
+
+    [Switch]
+    $Update
+)
+
+$originalLocation = Get-Location
+Set-Location -Path "../bulkLoad"
+
+if($Update) {
+  # Run First to create the data (Without measuring)
+  Write-Host "Creating data"
+  Invoke-Expression "./Invoke-Load$Template.ps1"
+}
+
+Write-Host "Starting Measure for $Template..."
+$timing = Measure-Command { Invoke-Expression "./Invoke-Load$Template.ps1"  }
+
+Write-Output "Total Time: $timing"
+
+Set-Location -Path $originalLocation
diff --git a/eng/performance/GetAll-Performance.ps1 b/eng/performance/GetAll-Performance.ps1
@@ -19,8 +19,8 @@ param(
 $times = @()
 
 $originalLocation = Get-Location
-
 Set-Location -Path $PagingTestsPath
+
 for ($i = 0; $i -lt $NumTrials; $i++) {
     $timing = Measure-Command { poetry run python edfi_paging_test }
     $times += $timing.TotalMilliseconds