Skip to content

cmu-db/QPPNet

This branch is 4 commits ahead of rabbit721/QPPNet:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2d337c1 · Jul 21, 2022

History

80 Commits
Jul 20, 2022
May 23, 2022
Jul 21, 2022
Oct 11, 2020
Jul 21, 2022
May 23, 2022
Jul 20, 2022
Jul 20, 2022
Jul 20, 2022
Oct 11, 2020

Repository files navigation

QPPNet in PyTorch DOI

The README below was written for the release tagged rabbit721.

This code contains a sample implementation for Plan-Structured Deep Neural Network Models for Query Performance Prediction presented at VLDB 2019, and the code for training/testing on

  • TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with Postgres

    The TPC-H data are generated with ./dbgen -s <scale factor>.

    The TPC-H queries are generated using ./qgen <query_id> -r <seed number> where varying seed random number is used to generate different queries from a template.

    Tables in PostgresSQL are created using dss.ddl and then have indexes created with tpch-postgres-index-ddl.sql.

    The query plan structure and query performance metrics are retrieved as json objects by running the generated queries above with the explain (analyze, format JSON, verbose) statement prepended.

  • TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with NoisePage

  • TPC-C queries and smallbank dataset generated using OLTP and benchmarked with NoisePage

Prerequisites

  • Linux or macOS
  • Python 3

Getting Started

Installation

  • Cloning the repo:

    git clone https://github.com/rabbit721/QPPNet.git
    cd QPPNet
    
  • Install the required python packages:

    • For pip: pip install -r requirements.txt
    • For conda: conda env create -f environment.yml

Getting the Datasets

  • TPC-H benchmarked with PostgresSQL

    # Under directory dataset/postgres_tpch_dataset
    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch0p1g.zip && unzip psqltpch0p1g.zip
    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch1g.zip && unzip psqltpch1g.zip
    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch10g.zip && unzip psqltpch10g.zip
    
  • TPC-H benchmarked with NoisePage:

    Data files already located under directory dataset/terrier_tpch_dataset as execution_0p1G.csv, execution_1G.csv, and execution_10G.csv

  • TPC-C and smallbank benchmarked with NoisePage:

    Data files already located under directory dataset/oltp_dataset as tpcc_pipeline.csv and sb_pipeline.csv

Examples for Training a model

  • On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with PostgresSQL

    python3 main.py --dataset PSQLTPCH -s 0 -t 250000 --batch_size 128 -epoch_freq 1000 --lr 2e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/postgres_tpch_dataset/tpch1g/900-exp_res_by_temp/  
    
  • On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with NoisePage

    python3 main.py --dataset TerrierTPCH -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 1e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/terrier_dataset/execution_1G.csv
    
  • On TPC-C dataset generated using OLTP with SF=1 and benchmarked with NoisePage

    python3 main.py --dataset OLTP -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 5e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/oltp_dataset/tpcc_pipeline.csv
    

Using a pre-trained model

  • Getting a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL:

    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/psqltpch_epoch4000.zip
    
  • Getting a model trained for 20000 epochs on TPC-H SF=1 dataset benchmarked with NoisePage:

    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/terriertpch_epoch20000.zip
    
  • Getting a model trained for 10000 epochs on TPC-C dataset generated with OLTP and benchmarked with NoisePage:

    wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/tpcc_epoch10000.zip
    

Examples for Testing a trained model

When testing, please make sure that trained models are saved in ./saved_model and the mean and range values of the train data are provided with the flag '--mean_range_dict' in order to normalize the input.

The '-s' flag is interpreted as an integer and is used to specify the epoch number of the saved model to be tested.

  • Testing a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL on TPC-H SF=10 dataset benchmarked with PostgresSQL.

    python3 main.py --test_time --dataset PSQLTPCH -s 4000 --mean_range_dict mean_range_dict.pickle --data_dir ./dataset/postgres_tpch_dataset/tpch10G/900-exp_res_by_temp/
    
  • Testing a model trained for 10000 epochs on the TPC-C benchmark on the smallbank dataset. Please make sure that models are saved in ./saved_model

    python3 main.py --test_time --dataset OLTP -s 10000 --data_dir ./dataset/oltp_dataset/sb_pipeline.csv
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%