- Pull out only the subset of data you need from an object by using simple SQL expressions
- Improve the performance by 400% and reduce the cost of applications that need to access data in S3.
- Peta-byte scale data transport solution
- Physical data transport between 80-50 TB to/from S3.
- Secure tamper resistant, use KMS 256 bit encryption
- Tracking using SNS and text messages. E-ink shipping label
- Pay per data transfer job
- Use cases: large data cloud migrations, DC decommissions, disaster recovery
- 💡 Use it if data takes longer than a week to upload e.g. > 2TB for 45 Mbps (T3, 269 days), >5TB 100 Mbps (120 days), >60TB 1000 Mbps (12 days).
- Flow
- Request snowball devices from the AWS console for delivery
- AWS Console -> Snowball service -> Plan your job (Import into S3, Export from S3, Local compute and storage only) -> Give shipping information & choose shipping speed -> Choose snowball or snowball edge variant you want -> Choose storage -> Load EC2 EMI -> IAM Role for Snowball -> Select AWS KMS Encryption key -> Set notification (SNS) & notification topics
- Install the snowball client on your servers
- Connect the snowball to your servers using ID and manifest file from Console and copy files using the client
- Ship back the device when you're done (goes to the right AWS facility)
- Data will be loaded into an S3 bucket
- Snowball is completely wiped
- Tracking is done using SNS, text messages and the AWS console
- Request snowball devices from the AWS console for delivery
- Snowball Edge
- Snowball Edge add computational capability (e.g. lambda) to the device
- You can use as local storage only if e.g. your region doesn't have lambda or for durability & capacity.
- 100 TB capacity with either:
- Storage optimized - 24 vCPU
- Compute optimized - 52 vCPU & optional GPU
- Supports
- A custom EC2 AMI so you can perform processing on the go
- Custom Lambda functions
- Very useful to pre-process the data while moving
- Use cases: data migration, image collation, IoT capture, machine learning
- Snowball Edge add computational capability (e.g. lambda) to the device
- Snowmobile
- Exabyte scale data transfer service - 45 foot long container truck
- Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)
- If you have more than 100 TB or PBs data
- Each Snowmobile has 100 PB of capacity (use multiple in parallel)
- Better than Snowball if you transfer more than 10 PB e.g. video libraries, image repositories, on-premise migration.
- 💡 Choose based on capacity:
- 50 TB Snowball has 42 TB usable free space
- 80 TB Snowball has 72 TB usable free space
- 100 TB Snowball Edge has 83 TB usable free space
- 📝Allows you to expose S3 to on-premises.
- Bridge between on-premise data and cloud data in S3
- 💡 Use cases: disaster recovery, backup & restore, tiered storage
- Flow
- Can be installed on Hyper-V and VMware.
- You then create gateway on Console
- 📝 Three types of Storage Gateway:
- File gateway -> S3
- Store & update files as objects in S3 asynchronously as you change them.
- Ownership/permissions and timestamps are stored in S# in the user-metadata of the object.
- Accessible using the NFS (network file system) and SMB protocol
- Supports S3 standard, & S3 IA, S3 One Zone IA & Glacier.
- Bucket access using IAM roles for each File Gateway
- Most recently used data is cached locally in the file gateway -> Low latency access
- Can be mounted on many servers
- 📝 File access / NFS => File Gateway (backed by S3)
- Volume gateway -> EBS
- Block storage in Amazon S3 with point-in-time backups as compressed Amazon EBS snapshot.
- Allows you to view S3 bucket as mountable volume
- Block storage using iSCSI protocol backed by S3
- Two options:
- Cached volumes: most frequently data are cached on premise on the volumes
- Stored volumes: entire dataset is on premise, scheduled backups to S3 as EBS snapshots
- 📝 Volumes / Block Storage / iSCSI => Volume gateway (backed by S3 with EBS snapshots)
- Tape Gateway -> Glacier
- Input: Your existing tape-based processes
- Output: As virtual tapes (using Virtual Tape Library, i.e. VTL) in S3 or Glacier.
- Uses iSCSI which is interface & networking standard for linking data storage facilities.
- 📝 VTL Tape solution / Backup with iSCSI => Tape gateway (backed by S3 and Glacier)
- File gateway -> S3
- Security: Your data is encrypted at rest by default in S3.
- Flow
- AWS Console -> Storage Gateway -> Choose gateway type -> Select host platform (VMware / hyper-v / EC2) -> IP address of gateway VM
- Serverless service to perform analytics directly against S3 files
- Uses SQL language to query the files
- Has a JDBC / ODBC driver that allows you to connect BI tools to it.
- Charged per query and amount of data scanned
- Supports CSV, JSON, ORC, Avro and Parquet (built on Presto)
- 💡Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc..
- 📝 Analyze data directly on S3 -> use Athena
- Usage
- Go to Athena on Console
- Create database with a query
- Create table & link to a bucket
- It's a link so you don't pay anything to have tables
- You can then write SQL queries against the table
- Allows you to easily query encrypted data stored in S3 and write encrypted results back.
- Both, server-side encryption and client-side encryption are supported