- Key / value store for objects
- 💡 Great for big objects, not so great for small objects (latency)
- It's object storage
- Object => You rewrite entire object
- Block store => You can just send changes to change the blob, can update incrementally as data is split into chunks.
- S3 is a global service (not region specific)
- ❗ A bucket however is attached to a region
- Supports IPv6, IPv4.
- 💡 Use cases: Static files, Key value store for big files e.g. movies, Website hosting
- Objects (files) are stored in buckets (directories)
- Buckets
- ❗ Bucket names must be unique across all existing bucket names in Amazon S3.
- Buckets are defined at the region level and must have globally unique name as it'll following URLs
https://s3-<region>.amazonaws.com/<bucket>/<object>
https://<bucket>.s3.amazonaws.com/<object>
(virtual host style addressing)
- You can set default encryption
- Options are: None, AES-256 (SSE-S3) and AWS-KMS (SSE-KMS)
- 📝Popular question: How to ensure that all objects can be encrypted at upload?
- ❗ 100 S3 buckets per account -> soft limit
- Folders
- Can be created under buckets
- ❗ They are pseudo and will be treated as object by S3
- S3 has a flat structure with no hierarchy
- But presentation in console gives the sense of "Directories"
- Version Control
- Can enable versioning at bucket level
- ❗📝 Once enabled, it cannot be disabled but just suspended.
- Same key overwrite will increment the version
- Best practice as it allows to
- Roll back to a previous version
- Protection against accidental deletion i.e. unintended deletes.
- Deleting files does not really delete versioned files
- It only puts Delete marker and files can still be downloaded.
- Deleting a delete marker -> Restores the object & adds a new delete marker.
- ❗ Only the owner of a bucket can permanently remove a delete marker e.g. delete a version.
- 💡 Best practice: Allow only specific users to delete & enforce MFA
- 📝Any file that is not versioned prior to enabling versioning will have the version
null
.
- Can enable versioning at bucket level
- Objects
- Have a key which is FULL path
- e.g.
<my_bucket>/my_file.txt
or<my_bucket>/my_folder1_another_folder/my_file.txt
- ❗ The are no concept of directories within buckets just key names with slashes.
- e.g.
- Object values are the contents of the file
- ❗ Max size is 5 TB, min size is 0 bytes (touched)
- ❗ If file is more than 5 GB, you need to use multi part upload
- Metadata is list of text key/value pairs.
- Can be system or user metadata.
- ❗ Cannot be modified after creation
- Tags are unicode key/value pairs.
- ❗ Up to 10
- 💡 Useful for security / lifecycle / replication rules
- Have a key which is FULL path
- Logging
- Object-level logging
- Can enable CloudWatch request metrics: e.g. total number of HTTP GET/PUT, 4XX etc.
- Can enable CloudTrail for auditing API calls and call stacks (not CloudWatch).
- 📝S3 Consistency Model
- Read after write consistency for PUTS of new objects
- E.g.
PUT 200
->GET 200
- ❗ Eventually consistent if you try to
GET
before to see if the object existed- E.g.
GET 404
->PUT 200
->GET 404
-> GET 404 gets cached
- E.g.
- E.g.
- Eventual consistency for
DELETE
s andPUTS
of existing objects- Read after updating object -> Might get an older version
- E.g.
PUT 200
->PUT 200
->GET 200
(might be older version)
- E.g.
- Delete -> Can retrieve it for a short time
- E.g.
DELETE 200
->GET 200
- E.g.
- Read after updating object -> Might get an older version
- Read after write consistency for PUTS of new objects
- Make an object publicly accessible
- On object level
- Set a predefined grant (also known as a canned ACL) for a bucket in Amazon S3.
- In Console -> Bucket -> Permissions -> Public access settings -> Untick "Block new public bucket policies" and "Block public and cross-account access if the bucket has public policies"
- Update the object's ACL -> Ensure your object is set to be publicly accessible.
- Set a predefined grant (also known as a canned ACL) for a bucket in Amazon S3.
- On bucket level
- Use a bucket policy that grants public read access to a specific object tag/prefix
- On object level
- Lifecycle Policies
- Set of rules to automating moving data between different tiers, to save storage cost.
- Transition actions
- When objects are transitioned to another storage class
- E.g. General Purpose --(60 days)--> Infrequent Access --(180 days)--> glacier
- ❗ Objects must be stored at least 30 days in standard for standard to => Standard IA or OneZone IA
- Expiration actions
- Expire & delete an object after a certain time period.
- E.g. access log files can be set to delete after specified period of time
- You can combine transition & expiration actions.
- You can filter the resources by a key name prefix, e.g.
tax/
will only includetax/doc1.txt
. - You can configure it to
- Clean-up expired object delete markers
- Affect "Current version" and & or "Previous version"
- Multipart Upload
- Use if file is more than >100MB (required for >=5GB)
- Allows uploading parts concurrently
- Improved throughput
- Quick recovery from any network failures
- Pause and resume object uploads
- Begin an upload before you know the final object size.
- All storage that any parts the aborted multipart upload consumed is freed & deleted.
- S3 supports a bucket lifecycle rule to abort multipart upload that don't complete within a specified number of days.
- Event Notifications
- Events
- Object created, object removed, object restored, object lost (of a Reduced Redundancy Storage (RRS) storage)
- Destinations
- Events
- Charged mainly based on
- Storage (per GB)
- Requests
- Fee for each request make to S3 APIs for managing objects.
- 💡 Usually very cheap, but can be a good idea to minimize the requests to optimize costs.
- See also Glacier for Glacier retrieval fees.
- Fee for each request make to S3 APIs for managing objects.
- Data transfer
- Optional fees for • region replication • transfer acceleration • management • analytics
- Requester pays
- Requester pays for requests + storage in bucket instead of bucket owner.
- Requester is from AWS (not anonymous) & sends
requester-pays
flag and billed by AWS.
- See S3 pricing page and AWS Pricing Calculator
-
Storage class is set based on object
-
S3 Storage Tiers
Storage Description Use-cases Retrieval latency Cost Standard General purpose Big Data analytics, mobile & gaming applications, content distribution... ms 💲💲💲💲💲💲 Standard IA Infrequent access For data that's less frequently accessed but requires rapid access e.g. you'll get a file directly every month. ms 💲💲💲💲💲 One Zone-Infrequent Access Low latency & high throughput performance Storing secondary backup copies of on-premise data or storing data you can recreate over time ms 💲💲💲💲 Reduced Redundancy Storage (RRS) Deprecated, don't use it, use one zone-infrequent access Noncritical, reproducible data at lower levels of redundancy than Amazon S3's standard storage ms 💲💲💲 Intelligent Tiering ML moves objects between IA and Standard tiers based on changing access patterns Unpredictable workloads, Changing access patterns, Lack of experience with storage optimization ms (small) Monitoring + auto-tiering fee Glacier Low cost cold storage. DR Back-ups, Media Asset Archival Expedited: 1-5 min, Standard: 3-5 hr, Bulk: 5 -12 hr 💲💲 + retrieval fee Glacier Deep Archive Lowest cost coldest storage Magnetic tapes / VTL / Regulatory archiving 12-48 hours 💲 + retrieval fee -
S3 Storage Tiers Reliability
Standard Reduced Redundancy Storage Standard Infrequent Access One - Infrequent Access S3 Intelligent Tiering Glacier Durability 99.999999999% 99.99% 99.999999999% 99.999999999% 99.999999999% 99.999999999% Availability 99.99% 99.99% 99.99% 99.5% 99.9% No SLA AZ ≥3 ≥2 ≥3 1 ≥3 ≥3 Concurrent facility fault tolerance 2 1 2 0 1 1
- Glacier classes moves objects to Amazon Glacier which is part of S3.
- Concepts
- Archive: store data in (one or more files) with unique archive ID.
- 📝The content of the archive is immutable, meaning that after an archive is created it cannot be updated.
- Vault: collection of archives where you upload data to.
- Archive: store data in (one or more files) with unique archive ID.
- Retrieval
-
❗ S3 object stored through Glacier can only be retrieved with S3 APIs as it'll map object path to archive id.
-
Options
Attribute Expedited Standard Bulk Data Subset of archives Any archive Large amounts Time 1 - 5 minutes 3 - 5 hours 5 -12 hours Pricing 💲💲💲💲 💲💲 💲 - Expedited: Access subset of archives in 1-5 minutes.
- On-demand: Accepted except for rare high load situations.
- Provisioned: Buy capacity to ensure that your retrieval capacity is available.
- Standard: Access any archive in 3-5 hours.
- Bulk: Access large amounts in 5-12 hours
- Expedited: Access subset of archives in 1-5 minutes.
-
❗ Deep Archive has only one option for retrieval time around 12-48 hours.
-
- ❗ You cannot upload directly from console, possible only through AWS CLI.
- Cross region replication (CCR)
- Copying is asynchronous replication
- On bucket level (entire bucket) or object level (based on prefix/tags)
- ❗📝 Must enable versioning (source and destination)
- CRR is built on top of versioning functionality.
- 🤗 As the replication is asynchronous, it needs 'non-changing' copy of data during replication
- Versioning makes it easy so file can still be modified (upload, delete, etc.) while replication is in progress.
- Buckets must be in different AWS regions & can be in different accounts
- Requires IAM permissions to S3
- Delete markers & deleting individual versions / delete markers are not replicated.
- Use cases: compliance, lower latency access, replication across accounts
- Flow:
- Create a bucket, and one more new one as replica
- Original bucket -> Management -> Replication -> Add rule
- Apply to entire bucket or "prefix or tags"
- Choose if objects encrypted with AWS KMS will also be encrypted
- Other account must also have access to AWS KMS as well
- For replicated objects you can change
- Storage class
- Ownership to destination bucket owner
- Set IAM rule
- Role gets created automatically that gives replication permissions to the original bucket
- ❗ Does not replicate:
- Existing files are not replicated, just new files but with same metadata and ACLs.
- Replicas of other objects, lifecycle actions and configurations, objects without bucket owner permissions, SSE-C & SSE-KMS encrypted object (can be configured to replicate)...
- S3 can host static websites and have them accessible on the www.
- 📝 The website URL will be => bucket name +
s3-website
+ region +amazonaws.com
.<bucket-name>.s3-website-<AWS-region>.amazonaws.com
<bucket-name>.s3-website.<AWS-region>.amazonaws.com
- 📝 The website URL will be => bucket name +
- If you get 403 (Forbidden) error; make sure bucket policy allows public reads!
- Supports redirects through a metadata tag.
- In bucket -> Properties -> Static website hosting
- Select
Use this bucket to host a website
(optionally specify default index and error files) - Another option is to redirect requests (to another place)
- S3 CORS (CORS=Cross Origin Resource Sharing)
- 📝 If you request data from another S3 bucket, you need to enable CORS
- Allows you to limit the number of websites that can request your files in S3 and this way limit costs.
- E.g.
bucket1
has image inbucket2
.bucket2
must then has CORS headers forbucket1
.
- Select
- Prerequisites
- ❗ The bucket must have the same name as your domain or subdomain.
- (Optionally) Use CloudFront for SSL/TLS
- A registered domain name.
- Route 53 as DNS service for the domain to configure DNS.