To foster large-scale research on vulnerability mitigation and to enable a comparison of different detection approaches, we make our dataset PatchDB from our DSN21 paper publicly available.
PatchDB is a large-scale security patch dataset that contains around 12K security patches and 24K non-security patches from the real world. You can find more details on the dataset in the paper "PatchDB: A Large-Scale Security Patch Dataset". You can also visit our PatchDB official website for more information.
We are happy to share PatchDB and hope you can find our dataset useful in your research.
Option 1: download the json
format PatchDB from Hugging Face
You can download the dataset from https://huggingface.co/datasets/sunlab/patch_db.
This repository is publicly accessible. You need to log into Hugging Face, share your contact information (email and username), and agree to your terms and conditions (if any) to download the dataset.
Option 2: download the .zip
dataset using the request form. You need to state your identity and research scope. We will verify them and then send you the download link of PatchDB dataset.
Request Steps:
-
Please open the online request form in a browser.
PatchDB Request Form: https://forms.gle/4CXnx9th1GcJAjC4A.
(If you are unable to access the page, please contact SunLab by email.) -
Sign in to your Google account.
Since our request form and download link are facilitated by Google, please use your Gmail as the valid email to receive the form response. -
In the request form, please include your name, affiliation, work email, homepage, and the purpose of using PatchDB.
The information is needed for verification. Note that your request may be ignored if we are not able to determine your identity or affiliation. We do not share your personal information with any third parties. -
Acknowledge all the information you provided is correct.
-
Read and acknowledge the Disclaimer & Download Agreement for PatchDB.
-
Submit the request form.
A request receipt will be emailed to the email address you provided. Once we verify your information, we will email the download link to you as soon as possible.
If you are using PatchDB for work that will result in a publication (thesis, dissertation, paper, article), please use the following citation:
@inproceedings{wang2021PatchDB,
author={Wang, Xinda and Wang, Shu and Feng, Pengbin and Sun, Kun and Jajodia, Sushil},
booktitle={2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
title={PatchDB: A Large-Scale Security Patch Dataset},
year={2021},
volume={},
number={},
pages={149-160},
doi={10.1109/DSN48987.2021.00030}
}
or
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun and Sushil Jajodia, "PatchDB: A Large-Scale Security Patch Dataset," 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2021), 2021, pp. 149-160, doi: 10.1109/DSN48987.2021.00030.
To download the PatchDB dataset, you must agree with the items of the succeeding Disclaimer & Download Agreement. You should carefully read the following terms before submitting the PatchDB request form.
-
PatchDB is constructed and cross-checked by 3 experts that work in security patch research. Due to the potential misclassification led by subjective factors, the Sun Security Laboratory (SunLab) cannot guarantee a 100% accuracy for samples in the dataset.
-
The copyright of the PatchDB dataset is owned by SunLab.
-
The purpose of using PatchDB should be non-commercial research and/or personal use. The dataset should not be used for commercial use and any profitable purpose.
-
The PatchDB dataset should not be re-selled or re-distributed. Anyone who has obtained PatchDB should not share the dataset with others without the permission from SunLab.
The PatchDB dataset is built by Sun Security Laboratory (SunLab) at George Mason University, Fairfax, VA.
Last Updated Date: July, 2021