HDBSCAN, which stands for Hierarchical Density-Based Spatial Clustering of Applications with Noise, is a clustering algorithm that extends the capabilities of DBSCAN by allowing it to find clusters of varying densities. This enables HDBSCAN to be more robust to parameter selection and to return meaningful clusters with little or no parameter tuning. It is particularly useful for exploratory data analysis, as it can efficiently handle large datasets and provides fast and reliable clustering results.
This repository contains a parallel implementation of HDBSCAN using OpenMP. By leveraging parallel processing, it can effectively utilize multiple CPU cores, making it suitable for high-performance computing environments and modern multicore systems. Additionally, this implementation is faster than the one provided by scikit-learn.
- Python (>=3.8)
- CMake
- OpenMP
- GCC/Clang
- Create shared libraries using make.
make
- Install required python libraries
pip install -r requirements.txt
Once the code is compiled, you can use the provided Python wrapper to utilize the parallel HDBSCAN implementation. A sample usage is given in main.py
.
👤 Karthick T. Sharma
- Github: @Karthick47v2
- LinkedIn: @Karthick47
@inproceedings{inproceedings,
author = {Campello, Ricardo and Moulavi, Davoud and Sander, Joerg},
year = {2013},
month = {04},
pages = {160-172},
title = {Density-Based Clustering Based on Hierarchical Density Estimates},
volume = {7819},
isbn = {978-3-642-37455-5},
doi = {10.1007/978-3-642-37456-2_14}
}
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Give a ⭐️ if this project helped you!