This repository has not been updated recently, and not compeleted, so I recommend you to refer to my repo just as a reference.
This repository contains Time-Series Anomaly Detection datasets, models, and their implementations. If you find any issues for this repository, such as errors in code, better hyperparameters, etc, please report through Issue channel or via pull request.
[20230522: deleted SMD, SMAP, MSL] There seems to be problem in datasets (SMD, SMAP, MSL) preprocessing, so let me update for these datasets later.
"0." denotes that experiment hasn't finished.
For individual run,
python main.py dataset=${dataset_name} model={model_name} {{other arguments}}
python main.py dataset=NeurIPS-TS-MUL model=USAD exp_id=default model.latent_dim=40 # example
If you want to run the model for various dataset,
sh scripts/data_loop.sh ${model_name} ${gpu_id}
sh scripts/data_loop.sh AnomalyTransformer 1 # example
If you want to compare models' performace for a specific dataset,
sh scripts/model_loop.sh ${data_name} ${gpu_id}
sh scripts/model_loop.sh SWaT 3 # example
for more script running examples, take a look at scripts dir.
This repo utilizes wandb and hydra for experiment tracking. You can tune your hyperparameters via:
wandb sweep ${yaml_file}
wandb agent ${sweep_id}
For hyperparameter tuning examples, take a look at hptune dir.
For list of dataset details, please refer to our notion dataset page. You may also take a look at EDA dir for exploratory data analysis.
All datasets are assumed to be in "data" folder.
- Toy Dataset (toyUSW) : We have created toy dataset to test algorithms promptly. train.npy contains periodic sine waves. test.npy has abnormal situations (stopped signal) and anomalies are labeled in file test_label.npy.
NeurIPS-TS dataset are created using the principles in https://openreview.net/forum?id=r8IvOsnHchr. We prepared Univariate/Multivariate dataset, for each data length being 1000. For data generation, please refer to univariate_generator, multivariate_generator.
-
NeurIPS-TS-UNI
-
NeurIPS-TS-MUL
SWaT and WADI dataset has two types of data: train (normal) and test (abnormal). Train set does not contain anomaly set. Test set has anomalies driven by researcher's attack scenarios. Request via guidelines in the link.
- SWaT (2022-10-25) : Secure Water Treatment Dataset
- With shared google drive link after request, refer to SWaT.A1 & A2_Dec 2015
- For Normal Dataset, refer to ./Physical/SWaT_Dataset_Normal_v0.xlsx
- For Attack Dataset, refer to ./Physical/SWaT_Dataset_Attack_v0.xlsx
- convert xlsx using read_xlsx_and_convert_to_csv in utils/tools.py
- WADI (2022-10-25) : Water Distribution Dataset
- With shared google drive link after request, refer to WADI.A2_19 Nov 2019
- For Normal Dataset, refer to ./WADI_14days_new.csv
- For Attack Dataset, refer to ./WADI_attackdataLABLE.csv
- PSM : Pooled Server Metrics Dataset
SMD, SMAP, MSL are provided in https://github.com/thuml/Anomaly-Transformer.
7. SMD : Server Machine Dataset
8. SMAP : Soil Moisture Active Passive satellite Dataset
9. MSL : Mars Science Laboratory Dataset
There seems to be problem in datasets (SMD, SMAP, MSL) preprocessing, so let me update for this dataset later.
- To be updated
- RandomModel: No training, returns anomaly_score ~ uniform[0,1]
- OCSVM David M. J. Tax, Robert P. W. Duin: "Support Vector Data Description.", Mach. Learn. 54(1): 45-66 (2004)
- Isolation Forest Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou: Isolation Forest. ICDM 2008: 413-422
- LOF Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander: LOF: Identifying Density-Based Local Outliers. SIGMOD Conference 2000: 93-104
- LSTMEncDec Malhotra, Pankaj, et al. "LSTM-based encoder-decoder for multi-sensor anomaly detection."(2016).
- LSTMVAE: Daehyung Park, Yuuna Hoshi, Charles C. Kemp: A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder. IEEE Robotics Autom. Lett. 3(2): 1544-1551 (2018)
- OmniAnomaly Su, Ya, et al. "Robust anomaly detection for multivariate time series through stochastic recurrent neural network." Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
- USAD Audibert, Julien, et al. "Usad: Unsupervised anomaly detection on multivariate time series." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
- DeepSVDD Lukas Ruff, Nico Görnitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Robert A. Vandermeulen, Alexander Binder, Emmanuel Müller, Marius Kloft: Deep One-Class Classification. ICML 2018: 4390-4399
- DAGMM Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Dae-ki Cho, Haifeng Chen: Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. ICLR (Poster) 2018
- THOC Lifeng Shen, Zhuocong Li, James T. Kwok: Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network. NeurIPS 2020
- Anomaly Transformer Jiehui Xu, Haixu Wu, Jianmin Wang, Mingsheng Long: Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. ICLR 2022
- OCSVM (Sklearn): https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html
- IsolationForest (Sklearn): https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
- LOF (Sklearn): https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html
- LSTMEncDec Implementation: https://joungheekim.github.io/2020/11/14/code-review/ (Explanations in Korean)
- LSTMVAE: https://github.com/carrtesy/LSTMVAE-Pytorch
- DeepSVDD (PyOD): https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.deep_svdd.DeepSVDD
- DAGMM : https://github.com/tnakae/DAGMM (Unoffical Pytorch)
- Anomaly Transformer Github: https://github.com/thuml/Anomaly-Transformer