Skip to content

Commit

Permalink
[RND-380] MongoDB Connection Pooling Experiments (#282)
Browse files Browse the repository at this point in the history
* Support for bulk loading into ODS/API 5.3 for test comparisons

* Performance test results

* clarify the bulk load settings

* Updated main readme
  • Loading branch information
stephenfuqua authored Aug 7, 2023
1 parent 0d4f7d6 commit 51c6673
Show file tree
Hide file tree
Showing 12 changed files with 516 additions and 7 deletions.
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,16 @@ information on the background and design decisions for this project.
* [Configuration](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/CONFIGURATION.md)
* [Developer getting started notes](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/README.md)
* [Additional technical details](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/TECHNICAL.md)
* [Docker for Local Meadowlark Development](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/DOCKER-LOCAL-DEV.md)
* [How to Submit an Issue](https://techdocs.ed-fi.org/x/Y8uIBg) (Tech Docs)
* [How Submit a Feature Request](https://techdocs.ed-fi.org/x/0YADAQ) (Tech
Docs)

### 😕 Cloud Deployment
## Deployment and Operations

You may be asking yourself, "where are the instructions for cloud deployment?"
We're working on it. Milestone 0.1.0 had severless deploy to AWS built-in, but
there were several aspects that no longer fit well with our refined strategy in
milestone 0.2.0, so the current release does not have any intrinsic cloud
deployment support. The upcoming 0.3.0 release will have basic deployment
capabilities on Azure, the preferred platform for our first pilot project.
* [Using Docker with Meadowlark](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/DOCKER.md)
* [Azure Deployment](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/eng/deploy/azure/)
* [Performance Testing](https://github.com/Ed-Fi-Exchange-OSS/Meadowlark/blob/main/docs/performance-testing/)

## Contributing

Expand Down
6 changes: 6 additions & 0 deletions docs/performance-testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Ed-Fi Meadowlark Performance Testing Results

* [RND-380: MongoDB Connection Pooling](mongo-connection-pooling.md). Summary:
when using the clustered mode, no discernible benefit to tuning MongoDB
connection pooling while loading the "partial Grand Bend" dataset. Also
includes comparison with ODS/API v5.3-patch4.
41 changes: 41 additions & 0 deletions docs/performance-testing/RND-380.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Scenario, Timing
8 threads pool size 1, 156.9
8 threads pool size 1, 196.12
8 threads pool size 1, 185.14
8 threads pool size 1, 198.87
8 threads pool size 1, 173.17
8 threads pool size 5, 177.76
8 threads pool size 5, 180.32
8 threads pool size 5, 159.98
8 threads pool size 5, 163.16
8 threads pool size 5, 166.52
8 threads pool size 100, 171.39
8 threads pool size 100, 166.76
8 threads pool size 100, 188.29
8 threads pool size 100, 165.87
8 threads pool size 100, 174.04
8 threads pool size 150, 156.92
8 threads pool size 150, 151.31
8 threads pool size 150, 180.37
8 threads pool size 150, 180.28
8 threads pool size 150, 162.29
1 threads pool size 1, 535.6
1 threads pool size 1, 526.85
1 threads pool size 1, 534.49
1 threads pool size 1, 534.49
1 threads pool size 1, 552.34
1 threads pool size 150, 379.5
1 threads pool size 150, 355.11
1 threads pool size 150, 365.01
1 threads pool size 150, 370.14
1 threads pool size 150, 365.91
4 threads pool size 150, 166.55
4 threads pool size 150, 160.14
4 threads pool size 150, 157.98
4 threads pool size 150, 183.37
4 threads pool size 150, 162.24
ODS/API, 92.23
ODS/API, 94.53
ODS/API, 89.43
ODS/API, 85.09
ODS/API, 84.73
153 changes: 153 additions & 0 deletions docs/performance-testing/mongo-connection-pooling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# RND-380: MongoDB Connection Pooling

## Goal

Experiment with [MongoDB connection
pooling](https://www.mongodb.com/docs/drivers/node/v4.15/fundamentals/connection/connection-options/
) to evaluate impact on application performance.

## Methodology

1. Start Meadowlark fully in Docker, using MongoDB as the backend and OpenSearch
as the search provider. (See below for environment settings).

```pwsh
cd Meadowlark-js
./reset-docker-compose.ps1
```

2. Bulk upload the "partial grand bend" data set, capturing the time taken.

```pwsh
cd ../eng/bulkLoad
Measure-Command { .\Invoke-LoadPartialGrandBend.ps1 }
```

3. Repeat for a total of 5 measurements with the same settings
4. Tune the connection pooling via the `MONGO_URI` setting in the `.env` file.
5. Repeat the measurement process.

An Ed-Fi ODS/API v5.3-patch4 environment was configured on the same VM in order
to make a comparison between the two platforms. In this repository's
`eng/ods-api` directory, the reader will find a PowerShell script `reset.ps1`
that builds a fresh Docker container environment running the two Ed-Fi database
images and the API image. Since this is for raw testing and head-to-head
comparison, this solution does not use NGiNX or PG Bouncer. To run against the
ODS/API, alter step 2 above to use `Invoke-LoadPartialGrandbend-ODSAPI.ps1`

## Environment

All tests run on a Windows Server 2019 virtual machine as Docker host, running
the latest version of Docker Desktop, using WSL2. The VM has 12 cores assigned
to it using Intel Xeon Gold 6150 @ 2.70 GHz with 24.0 GB of memory and plenty of
disk space. Docker is configured to use up to 8 CPUs, 12 GB of memory, 2 GB of
swap space, and limit of 64 GB on virtual disk.

Baseline `.env` configuration file:

```none
OAUTH_SIGNING_KEY=<omitted>
OWN_OAUTH_CLIENT_ID_FOR_CLIENT_AUTH=meadowlark_verify-only_key_1
OWN_OAUTH_CLIENT_SECRET_FOR_CLIENT_AUTH=meadowlark_verify-only_secret_1
OAUTH_SERVER_ENDPOINT_FOR_OWN_TOKEN_REQUEST=http://localhost:3000/local/oauth/token
OAUTH_SERVER_ENDPOINT_FOR_TOKEN_VERIFICATION=http://localhost:3000/local/oauth/verify
OAUTH_HARD_CODED_CREDENTIALS_ENABLED=true
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=admin
OPENSEARCH_ENDPOINT=http://localhost:9200
OPENSEARCH_REQUEST_TIMEOUT=10000
AUTHORIZATION_STORE_PLUGIN=@edfi/meadowlark-mongodb-backend
DOCUMENT_STORE_PLUGIN=@edfi/meadowlark-mongodb-backend
QUERY_HANDLER_PLUGIN=@edfi/meadowlark-opensearch-backend
LISTENER1_PLUGIN=@edfi/meadowlark-opensearch-backend
MONGODB_USER=mongo
MONGODB_PASS=<omitted>
MONGO_URI=mongodb://${MONGODB_USER}:${MONGODB_PASS}@mongo1:27017,mongo2:27018,mongo3:27019/?replicaSet=rs0&maxPoolSize=100
FASTIFY_RATE_LIMIT=false
FASTIFY_PORT=3000
# Next line commented out, therefore it will auto-cluster to match number of
# available CPUs.
# FASTIFY_NUM_THREADS=4
MEADOWLARK_STAGE=local
LOG_LEVEL=debug
IS_LOCAL=true
BEGIN_ALLOWED_SCHOOL_YEAR=2022
END_ALLOWED_SCHOOL_YEAR=2034
ALLOW_TYPE_COERCION=true
ALLOW__EXT_PROPERTY=true
SAVE_LOG_TO_FILE=true
LOG_FILE_LOCATION=c:/temp/
```

The API bulk client loader runs on the VM host, connecting to the Docker
network. It is configured to use maximum of 100 connections, 50 tasks buffered,
and 500 max simultaneous requests. Retries are disabled. All of the XML files
load without error at this time.

## Results

Times below are given in seconds. In the default settings, there was one extreme
outlier that significantly impacted the average time, as seen by the high
standard deviation.

| Scenario | Avg | St Dev |
| ------------------------ | ------ | ------ |
| 8 threads, pool size 1 | 182.04 | 17.33 |
| 8 threads, pool size 5 | 169.55 | 9.01 |
| 8 threads, pool size 100 | 173.27 | 9.04 |
| 8 threads, pool size 150 | 166.23 | 13.44 |
| 1 threads, pool size 1 | 536.75 | 9.39 |
| 1 threads, pool size 150 | 367.13 | 8.84 |
| 4 threads, pool size 150 | 166.06 | 10.18 |
| ODS/API | 89.20 | 4.32 |

See [RND-380.csv](RND-38.csv) for raw data.

## Analysis

In the default configuration, the Meadowlark API startup process forks itself as
many times as there are CPU's available. Thus, in default settings, there are
eight API processes running in parallel. Although these were initiated by the
same NodeJs process, each process is isolated with respect to memory. Thus, each
of the eight processes has a separate pool of connections. Within each forked
process there is still potential for connection pool re-use, thanks to the use
of asynchronous processing. However, it is clear that the connection pool
settings have little impact compared to the threading. Even a pool size of five
proved adequate when running with eight CPUs. Interestingly, the pool size of
150 with only four CPU's also yields consistent results compared to the tests
with eight CPU's.

The only time we see a discernible difference in results is when we reduce the
number of threads used by the API (`FASTIFY_NUM_THREADS`). For this data set,
the performance is discernibly worse with only one thread, whether using one or
150 connections in the pool. However, the connection pooling in such a low CPU
scenario does clearly yield an improved experience, reducing the average time to
complete the test by roughly 69%.

> **Note** Five executions of each test appears to be useful, but where timings
> are very close to one another, the number of data points is insufficient for
> giving a useful statistical significance.
The difference between Meadowlark and the ODS/API is obviously significant: the
ODS/API is almost 50% faster.

## Conclusions

Under the environment conditions described above, this research spike does not
find significant benefit to tuning the size of the MongoDB connection pool,
given there are at least four process threads running.

If a Meadowlark API container has only one or two virtual CPUs available, then
tuning the connection pooling could theoretically be beneficial. However, out of
the box, the MongoDB client has a default value of 100 connections available,
which may be appropriate for many situations.

Those with expertise in MongoDB might find that there are other connection pool
settings, such as timeouts, that could be relevant for a given situation.
46 changes: 46 additions & 0 deletions eng/bulkLoad/Invoke-LoadPartialGrandBend-ODSAPI.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# SPDX-License-Identifier: Apache-2.0
# Licensed to the Ed-Fi Alliance under one or more agreements.
# The Ed-Fi Alliance licenses this file to you under the Apache License, Version 2.0.
# See the LICENSE and NOTICES files in the project root for more information.

# Runs part of the bulk upload of the Grand Bend dataset, aka "populated
# template" - restricted to the data needed to run the performance testing kit.
# This enables a faster setup, at the expense of having less data in the system.

# Tuned for use with the ODS/API in shared instance mode. Before running this
# script, make sure that the Docker containers are running and that the
# bootstrap key/secret have been setup. Both of these steps are handled by
# running `./reset.ps1`.

#Requires -Version 7

param(
[string]
$Key = "sampleKey",

[string]
$Secret = "sampleSecret",

[string]
$BaseUrl = "http://localhost"
)

$ErrorActionPreference = "Stop"

Import-Module ./modules/Package-Management.psm1 -Force
Import-Module ./modules/Get-XSD.psm1 -Force
Import-Module ./modules/BulkLoad.psm1 -Force
$sampleDataVersion = "3.3.1-b"

$paths = Initialize-ToolsAndDirectories
$paths.SampleDataDirectory = Import-SampleData -Template "GrandBend" -Version $sampleDataVersion

$parameters = @{
BaseUrl = $BaseUrl
Key = $Key
Secret = $Secret
Paths = $paths
}

Write-Descriptors @parameters
Write-PartialGrandBend @parameters
2 changes: 2 additions & 0 deletions eng/ods-api/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
POSTGRES_USER=postgres
POSTGRES_PASSWORD=fghjkyuiok3
9 changes: 9 additions & 0 deletions eng/ods-api/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# SPDX-License-Identifier: Apache-2.0
# Licensed to the Ed-Fi Alliance under one or more agreements.
# The Ed-Fi Alliance licenses this file to you under the Apache License, Version 2.0.
# See the LICENSE and NOTICES files in the project root for more information.

FROM edfialliance/ods-api-web-api:v2.1.5@sha256:2e6c04b1821f3584a58a993d65b62105b62a0323a4c99acbf1ee70f88f433c10
COPY appsettings.template.json /app/appsettings.template.json

ENTRYPOINT ["/app/run.sh"]
7 changes: 7 additions & 0 deletions eng/ods-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# ods-api Directory

This directory supports starting the ODS/API v5.3-patch4 in sandbox mode, with
change queries, profiles, and composites disabled. Useful for head-to-head test
comparisons with Meadowlark.

> **Warning** do not publish to Docker Hub.
96 changes: 96 additions & 0 deletions eng/ods-api/appsettings.template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
{
"ApplicationInsights": {
"InstrumentationKey": "",
"LogLevel": {
"Default": "Warning"
}
},
"ConnectionStrings": {
"EdFi_Ods": "host=${ODS_POSTGRES_HOST};port=${POSTGRES_PORT};username=${POSTGRES_USER};password=${POSTGRES_PASSWORD};database=EdFi_{0};pooling=false;application name=EdFi.Ods.WebApi",
"EdFi_Security": "host=${ADMIN_POSTGRES_HOST};port=${POSTGRES_PORT};username=${POSTGRES_USER};password=${POSTGRES_PASSWORD};database=EdFi_Security;pooling=false;application name=EdFi.Ods.WebApi",
"EdFi_Admin": "host=${ADMIN_POSTGRES_HOST};port=${POSTGRES_PORT};username=${POSTGRES_USER};password=${POSTGRES_PASSWORD};database=EdFi_Admin;pooling=false;application name=EdFi.Ods.WebApi",
"EdFi_Master": "host=${ADMIN_POSTGRES_HOST};port=${POSTGRES_PORT};username=${POSTGRES_USER};password=${POSTGRES_PASSWORD};database=postgres;pooling=false;application name=EdFi.Ods.WebApi"
},
"BearerTokenTimeoutMinutes": "30",
"DefaultPageSizeLimit": 500,
"ApiSettings": {
"Mode": "$API_MODE",
"MinimalTemplateSuffix": "Ods_Minimal_Template",
"UsePlugins": false,
"PopulatedTemplateSuffix": "Ods_Populated_Template",
"PlainTextSecrets": true,
"MinimalTemplateScript": "PostgreSQLMinimalTemplate",
"Engine": "PostgreSQL",
"OdsTokens": [],
"PopulatedTemplateScript": "PostgreSQLPopulatedTemplate",
"UseReverseProxyHeaders": true,
"Features": [
{
"Name": "OpenApiMetadata",
"IsEnabled": true
},
{
"Name": "AggregateDependencies",
"IsEnabled": true
},
{
"Name": "TokenInfo",
"IsEnabled": true
},
{
"Name": "Extensions",
"IsEnabled": true
},
{
"Name": "Composites",
"IsEnabled": false
},
{
"Name": "Profiles",
"IsEnabled": false
},
{
"Name": "ChangeQueries",
"IsEnabled": false
},
{
"Name": "IdentityManagement",
"IsEnabled": false
},
{
"Name": "OwnershipBasedAuthorization",
"IsEnabled": false
},
{
"Name": "UniqueIdValidation",
"IsEnabled": false
},
{
"Name": "XsdMetadata",
"IsEnabled": true
}
],
"ExcludedExtensions": []
},
"Plugin": {
"Folder": "./Plugin",
"Scripts": [
"tpdm"
]
},
"Caching": {
"Descriptors": {
"AbsoluteExpirationSeconds": 1800
},
"PersonUniqueIdToUsi": {
"AbsoluteExpirationSeconds": 0,
"SlidingExpirationSeconds": 14400
}
},
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning"
}
}
}
Loading

0 comments on commit 51c6673

Please sign in to comment.