- 📝 Redshift = Data warehouse = Analytics / BI
- Columnar database e.g. stores columns of data instead of rows.
- Good for OLAP (online analytical processing), see RDBMS for OLTP (online transaction processing)
- Makes compression very easier
- For analytics and data warehousing
- Used to pull in very large and complex data sets.
- Has a SQL interface for performing the queries
- Heavily modified version of PostgreSQL
- Scaling
- Can scale to PBs of data
- From 1 node to 128 nodes, up to 160 GB of space per node
- Leader node: for query planning, results aggregation
- Compute node: for performing the queries, send results to leader
- It's a Massively Parallel Query Execution (MPP) database.
- Coordinated processing of a single task by multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface.
- Data and query is automatically distributed and balanced.
- Pay as you go based on the instances provisioned (not serverless)
- 💡 Choose right cluster size & use as much as you can
- 💡 Great to use with BI tools on top of it such as AWS Quicksight, SQL Server Reporting Services,Tableau integrate with it
- Data is loaded from S3, DynamoDB, DMS (Database Migration Services), other DBs...
- Good practice -> Create read replica from RDS and pull the data from the read replica and load it into Redshift
- Copy between regions: Take snapshot => Copy snapshot to new region => Create cluster from snapshot
- Redshift Spectrum: serverless service to perform queries directly against S3 (no need to load)
- Security: VPC / IAM / KMS & CloudWatch Monitoring
- Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC
- Encrypted in transit using SSL
- Pricing: Pay per Compute Node hours (not leader node), backup and data transfer (only within a VPC)
- Cluster subnet group allows you to specify set of subnets in your VPC for your instances.
- No multi AZ, only available in 1 AZ
- Snapshots can be deployed to new AZ for e.g. DR.
- Backups are enabled by default, max 35 days.
- Attempts to maintain at least three copies of your data.
- Original, replica on the compute nodes, and a backup in S3.
- Have functionality for Automated Cross-Region Snapshot.
- Always keeps three copies of your data