Experimental evaluation of MongoDB's FPTP query planner

This code evaluates the effectiveness of MongoDB's query optimizer. We define query optimizer effectiveness as the capability of the query optimizer to consistently choose the best execution plan among all alternatives. This tool is capable of:

Populating a MongoDB database with customizable workload.
Visualizing MongoDB's choices of query plan and their performance.
Visualizing ties (i.e. query plans that has similar execution time)
Visualizing the overhead of the FPTP query optimization.

Project Scope

This repository contains all code required to reproduce the results in our paper "First Past the Post: Evaluating Query Optimization in MongoDB". That includes server configuration, scripts to build multiple versions of MongoDB from source, experimental harnesses, applying additional workloads to installations of MongoDB on cloud-hosted services, measuring query execution time, visualizing ties of query plan choices, analysis overhead of the FPTP approach, storage of code data and outcomes in appropriate stores, and producing reports and charts on the results.

Code for all experiments and results

Instructions

Provision an AWS instance:
- m6id.large (or larger) recommended
- Attach at least 200GB storage
- Use the Amazon Linux 2023 AMI
- Make sure you have ssh access (keypair and ssh is permitted inbound from your client)
Log in and run:
```
$ wget -c 'https://github.com/michaelcahill/mongodb-fptp-paper/archive/refs/heads/main.zip'
$ unzip main.zip && rm main.zip

$ sudo bash ./mongodb-fptp-paper-main/prep.sh
$ ./mongodb-fptp-paper-main/build-all.sh
$ ./mongodb-fptp-paper-main/run-all.sh
```
Note that build-all.sh takes 6-7 hours on the minimum supported instance, and run-all.sh takes approximately 71 hours, or around 3 full days. The build will be faster on a larger instance, but the experiment runs will not. The experiment runs will not speed up on a larger instance, but can be parallelized across multiple instances by copying the binaries from the build step and running the run.sh script with parameters to cover the values in run-all.sh.
Data and charts will be generated in the results-cover, results-both and results-single directories, corresponding to different index choices.

The latex source for the paper is in ./vldb-submission-latex, and images from the results directories are copied as follows:

results-single/v7.0.1/processed-result/uniform/*.png -> images/results-single-index/v7.0.1
results-cover/v7.0.1/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1
results-cover/v7.0.1-with-coll/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1-with-coll
results-cover/v7.0.1-with-coll-with-fix/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1-with-coll-with-fix
results-both/v7.0.1/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1
results-both/v7.0.1-with-coll/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1-with-coll
results-both/v7.0.1-with-coll-with-fix/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1-with-coll-with-fix

Running the code manually

We also provide information about how to run the code without the scripts described above. Here are links for manual steps run the experiments. Please note that some steps cannot be skipped. It is strongly recommended to execute the code step by step according to the instructions. Otherwise, you may encounter errors.

Build MongoDB from source > Instructions
Deploy different versions of MongoDB on AWS instances. (Includes links to modified MongoDB source code)
> Instructions
Prerequisites and suggested settings.
> Instructions
Run the experiments that visualize query plans and their performance.
> Instructions
Visualize ties (i.e. query plans that has similar execution time).
> Instructions
Measure overhead and evaluate results.
> Instructions

Contributions

VLDB submission
- Overleaf Link
- Latex code and .pdf file

The results and findings of this project have the following contributions towards understanding the nature of the MongoDB query optimizer:

We proposed an innovative way to evaluate and visualize the impact on query performance of an optimizer’s choices. By using this approach, we identified places where the MongoDB query optimizer chooses sub-optimal query plans. Our approach could form the basis of an automated regression testing tool to verify that the query planner in MongoDB improves over time.
We identified causes of the preference bias of FPTP, in which index scans are systematically chosen even when a collection scan would run faster. We evaluate a fix for this problem. Changes to the MongoDB source code are included
The query plan visualization tool can visualize ties (i.e. query plans with similar execution time)

This work was originally based on Dawei Tao's honours thesis: Overleaf Link

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
dashboard		dashboard
docs		docs
experiment		experiment
mongoversions		mongoversions
overhead_experiments		overhead_experiments
processing		processing
vldb-submission-latex		vldb-submission-latex
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
build-all.sh		build-all.sh
prep.sh		prep.sh
requirements.txt		requirements.txt
run-all.sh		run-all.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Experimental evaluation of MongoDB's FPTP query planner

Project Scope

Instructions

Running the code manually

Contributions

About

Uh oh!

Releases

Packages

Languages

michaelcahill/mongodb-fptp-paper

Folders and files

Latest commit

History

Repository files navigation

Experimental evaluation of MongoDB's FPTP query planner

Project Scope

Instructions

Running the code manually

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages