This code evaluates the effectiveness of MongoDB's query optimizer. We define query optimizer effectiveness as the capability of the query optimizer to consistently choose the best execution plan among all alternatives. This tool is capable of:
- Populating a MongoDB database with customizable workload.
- Visualizing MongoDB's choices of query plan and their performance.
- Visualizing ties (i.e. query plans that has similar execution time)
- Visualizing the overhead of the FPTP query optimization.
This repository contains all code required to reproduce the results in our paper "First Past the Post: Evaluating Query Optimization in MongoDB". That includes server configuration, scripts to build multiple versions of MongoDB from source, experimental harnesses, applying additional workloads to installations of MongoDB on cloud-hosted services, measuring query execution time, visualizing ties of query plan choices, analysis overhead of the FPTP approach, storage of code data and outcomes in appropriate stores, and producing reports and charts on the results.
Code for all experiments and results
-
Provision an AWS instance:
- m6id.large (or larger) recommended
- Attach at least 200GB storage
- Use the Amazon Linux 2023 AMI
- Make sure you have ssh access (keypair and ssh is permitted inbound from your client)
-
Log in and run:
$ wget -c 'https://github.com/michaelcahill/mongodb-fptp-paper/archive/refs/heads/main.zip' $ unzip main.zip && rm main.zip $ sudo bash ./mongodb-fptp-paper-main/prep.sh $ ./mongodb-fptp-paper-main/build-all.sh $ ./mongodb-fptp-paper-main/run-all.sh
Note that
build-all.sh
takes 6-7 hours on the minimum supported instance, andrun-all.sh
takes approximately 71 hours, or around 3 full days. The build will be faster on a larger instance, but the experiment runs will not. The experiment runs will not speed up on a larger instance, but can be parallelized across multiple instances by copying the binaries from the build step and running therun.sh
script with parameters to cover the values inrun-all.sh
. -
Data and charts will be generated in the
results-cover
,results-both
andresults-single
directories, corresponding to different index choices. -
The latex source for the paper is in ./vldb-submission-latex, and images from the results directories are copied as follows:
results-single/v7.0.1/processed-result/uniform/*.png -> images/results-single-index/v7.0.1 results-cover/v7.0.1/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1 results-cover/v7.0.1-with-coll/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1-with-coll results-cover/v7.0.1-with-coll-with-fix/processed-result/uniform/*.png -> images/results-with-covering-index/v7.0.1-with-coll-with-fix results-both/v7.0.1/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1 results-both/v7.0.1-with-coll/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1-with-coll results-both/v7.0.1-with-coll-with-fix/processed-result/uniform/*.png -> images/results-without-covering-index/v7.0.1-with-coll-with-fix
We also provide information about how to run the code without the scripts described above. Here are links for manual steps run the experiments. Please note that some steps cannot be skipped. It is strongly recommended to execute the code step by step according to the instructions. Otherwise, you may encounter errors.
- Build MongoDB from source > Instructions
- Deploy different versions of MongoDB on AWS instances. (Includes links to modified MongoDB source code)
> Instructions - Prerequisites and suggested settings.
> Instructions - Run the experiments that visualize query plans and their performance.
> Instructions - Visualize ties (i.e. query plans that has similar execution time).
> Instructions - Measure overhead and evaluate results.
> Instructions
- VLDB submission
The results and findings of this project have the following contributions towards understanding the nature of the MongoDB query optimizer:
- We proposed an innovative way to evaluate and visualize the impact on query performance of an optimizer’s choices. By using this approach, we identified places where the MongoDB query optimizer chooses sub-optimal query plans. Our approach could form the basis of an automated regression testing tool to verify that the query planner in MongoDB improves over time.
- We identified causes of the preference bias of FPTP, in which index scans are systematically chosen even when a collection scan would run faster. We evaluate a fix for this problem. Changes to the MongoDB source code are included
- The query plan visualization tool can visualize ties (i.e. query plans with similar execution time)
This work was originally based on Dawei Tao's honours thesis: Overleaf Link