Skip to content

Latest commit

 

History

History
76 lines (64 loc) · 7.51 KB

README.md

File metadata and controls

76 lines (64 loc) · 7.51 KB

database comparison

PR from everyone is welcomed!

OLAP databases

I use the term loosely. I consider OLAP databases to range from Trino/Presto/Athena (it's federated query engine but people use it like a database), Dremio, ClickHouse, Apache Druid, Apache Pinot, Snowflake, AWS RedShift, GCP BigQuery and others.

This is a good list to show who is in the OLAP database space. https://benchmark.clickhouse.com/ Screenshot 2023-08-17 at 12 40 52 PM

List of OLAP databases that utilize SIMD (Single Instruction Mutiple Data)

What is database that uses SIMD https://atwong.medium.com/the-hottest-area-of-database-design-querying-billions-of-rows-per-second-with-simd-aa705fb5dfb6

Here is a list of databases that are known to utilize SIMD. As you can see, from the ClickHouse Benchmark, databases with SIMD dominate the leaderboard.

List of OLAP databases that support primary key to support batch and streaming upsert

Why is primary key support important to batch and streaming insert, update, delete and upsert scenarios. https://atwong.medium.com/list-of-olap-databases-that-support-primary-key-8e42a65fbee3

List of OLAP databases that support the MySQL protocol

  • Dolt database (OLTP): Dolt is a SQL database you can fork, clone, branch, merge, push and pull just like a Git repository. Connect to Dolt just like any MySQL database to run SQL queries. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate’s changes. https://www.dolthub.com/
  • StarRocks (OLAP): StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query. https://www.starrocks.io/
  • TiDB (OLTP+OLAP): TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. https://www.pingcap.com/
  • Sphinx Search (Search): Sphinx is a fulltext search engine that provides text search functionality to client applications. https://sphinxsearch.com/
  • SingleStore: SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing. https://docs.singlestore.com/cloud/connect-to-your-workspace/connect-with-mysql/connect-with-mysql-client/

List of OLAP databases that uses a cost based optimizer (CBO) or rule based optimizer for query execution

Data Processing Models of different OLAP databases

  • StarRocks: MPP
  • Apache Druid: Scatter-Gather
  • ClickHouse: MPP
  • SingleStore: Scatter-Gather
  • Apache Spark: Map-Reduce

OLAP databases that can run TPC-H benchmarks

What is the difference between TPC-H and TPC-DS? TL-DR; TPC-DS has more difficult SQL-like SQL queries with different types of JOINS compared to TPC-H. Many OLAP systems can't even complete the TPC-H benchmark or the more difficult TPC-DS benchmark. https://atwong.medium.com/what-is-the-difference-between-tpc-h-and-tpc-ds-benchmarks-cb92fc104c32

OLAP databases that can run TPC-DS benchmarks

What is the difference between TPC-H and TPC-DS? TL-DR; TPC-DS has more difficult SQL-like SQL queries with different types of JOINS compared to TPC-H. Many OLAP systems can't even complete the TPC-H benchmark or the more difficult TPC-DS benchmark. https://atwong.medium.com/what-is-the-difference-between-tpc-h-and-tpc-ds-benchmarks-cb92fc104c32

OLAP databases that support separation of compute and storage

  • StarRocks: Yes
  • Clickhouse: Yes
  • Rockset: Yes
  • Apache Druid: Yes
  • SingleStore: Yes
  • Snowflake: Yes
  • RedShift: Only with RedShift Spectrum only.