Amazon S3

Key / value store for objects
- 💡 Great for big objects, not so great for small objects (latency)
- It's object storage
  - Object => You rewrite entire object
  - Block store => You can just send changes to change the blob, can update incrementally as data is split into chunks.
S3 is a global service (not region specific)
- ❗ A bucket however is attached to a region
Supports IPv6, IPv4.
💡 Use cases: Static files, Key value store for big files e.g. movies, Website hosting
Objects (files) are stored in buckets (directories)
Buckets
- ❗ Bucket names must be unique across all existing bucket names in Amazon S3.
- Buckets are defined at the region level and must have globally unique name as it'll following URLs
  - https://s3-<region>.amazonaws.com/<bucket>/<object>
  - https://<bucket>.s3.amazonaws.com/<object> (virtual host style addressing)
- You can set default encryption
  - Options are: None, AES-256 (SSE-S3) and AWS-KMS (SSE-KMS)
  - 📝Popular question: How to ensure that all objects can be encrypted at upload?
- ❗ 100 S3 buckets per account -> soft limit
- Folders
  - Can be created under buckets
  - ❗ They are pseudo and will be treated as object by S3
    - S3 has a flat structure with no hierarchy
    - But presentation in console gives the sense of "Directories"
- Version Control
  - Can enable versioning at bucket level
    - ❗📝 Once enabled, it cannot be disabled but just suspended.
  - Same key overwrite will increment the version
  - Best practice as it allows to
    - Roll back to a previous version
    - Protection against accidental deletion i.e. unintended deletes.
      - Deleting files does not really delete versioned files
      - It only puts Delete marker and files can still be downloaded.
      - Deleting a delete marker -> Restores the object & adds a new delete marker.
        
        ❗ Only the owner of a bucket can permanently remove a delete marker e.g. delete a version.
      - 💡 Best practice: Allow only specific users to delete & enforce MFA
  - 📝Any file that is not versioned prior to enabling versioning will have the version null.
Objects
- Have a key which is FULL path
  - e.g. <my_bucket>/my_file.txt or <my_bucket>/my_folder1_another_folder/my_file.txt
  - ❗ The are no concept of directories within buckets just key names with slashes.
- Object values are the contents of the file
  - ❗ Max size is 5 TB, min size is 0 bytes (touched)
  - ❗ If file is more than 5 GB, you need to use multi part upload
  - Metadata is list of text key/value pairs.
    - Can be system or user metadata.
    - ❗ Cannot be modified after creation
- Tags are unicode key/value pairs.
  - ❗ Up to 10
  - 💡 Useful for security / lifecycle / replication rules
Logging
- Object-level logging
- Can enable CloudWatch request metrics: e.g. total number of HTTP GET/PUT, 4XX etc.
- Can enable CloudTrail for auditing API calls and call stacks (not CloudWatch).
📝S3 Consistency Model
- Read after write consistency for PUTS of new objects
  - E.g. PUT 200 -> GET 200
  - ❗ Eventually consistent if you try to GET before to see if the object existed
    - E.g. GET 404 -> PUT 200 -> GET 404 -> GET 404 gets cached
- Eventual consistency for DELETEs and PUTS of existing objects
  - Read after updating object -> Might get an older version
    - E.g. PUT 200 -> PUT 200 -> GET 200 (might be older version)
  - Delete -> Can retrieve it for a short time
    - E.g. DELETE 200 -> GET 200
Make an object publicly accessible
- On object level
  - Set a predefined grant (also known as a canned ACL) for a bucket in Amazon S3.
    - In Console -> Bucket -> Permissions -> Public access settings -> Untick "Block new public bucket policies" and "Block public and cross-account access if the bucket has public policies"
  - Update the object's ACL -> Ensure your object is set to be publicly accessible.
- On bucket level
  - Use a bucket policy that grants public read access to a specific object tag/prefix
Lifecycle Policies
- Set of rules to automating moving data between different tiers, to save storage cost.
- Transition actions
  - When objects are transitioned to another storage class
  - E.g. General Purpose --(60 days)--> Infrequent Access --(180 days)--> glacier
  - ❗ Objects must be stored at least 30 days in standard for standard to => Standard IA or OneZone IA
- Expiration actions
  - Expire & delete an object after a certain time period.
  - E.g. access log files can be set to delete after specified period of time
- You can combine transition & expiration actions.
- You can filter the resources by a key name prefix, e.g. tax/ will only include tax/doc1.txt.
- You can configure it to
  - Clean-up expired object delete markers
  - Affect "Current version" and & or "Previous version"
Multipart Upload
- Use if file is more than >100MB (required for >=5GB)
- Allows uploading parts concurrently
  - Improved throughput
  - Quick recovery from any network failures
  - Pause and resume object uploads
  - Begin an upload before you know the final object size.
- All storage that any parts the aborted multipart upload consumed is freed & deleted.
  - S3 supports a bucket lifecycle rule to abort multipart upload that don't complete within a specified number of days.
Event Notifications
- Events
  - Object created, object removed, object restored, object lost (of a Reduced Redundancy Storage (RRS) storage)
- Destinations
  - Determines where events will be published
  - Can be SNS, SQS and Lambda

S3 Pricing

Charged mainly based on
- Storage (per GB)
  - See S3 storage classes
- Requests
  - Fee for each request make to S3 APIs for managing objects.
    - 💡 Usually very cheap, but can be a good idea to minimize the requests to optimize costs.
  - See also Glacier for Glacier retrieval fees.
- Data transfer
- Optional fees for • region replication • transfer acceleration • management • analytics
Requester pays
- Requester pays for requests + storage in bucket instead of bucket owner.
- Requester is from AWS (not anonymous) & sends requester-pays flag and billed by AWS.
See S3 pricing page and AWS Pricing Calculator

S3 Storage Classes

Storage class is set based on object

S3 Storage Tiers

Storage	Description	Use-cases	Retrieval latency	Cost
Standard	General purpose	Big Data analytics, mobile & gaming applications, content distribution...	ms	💲💲💲💲💲💲
Standard IA	Infrequent access	For data that's less frequently accessed but requires rapid access e.g. you'll get a file directly every month.	ms	💲💲💲💲💲
One Zone-Infrequent Access	Low latency & high throughput performance	Storing secondary backup copies of on-premise data or storing data you can recreate over time	ms	💲💲💲💲
Reduced Redundancy Storage (RRS)	Deprecated, don't use it, use one zone-infrequent access	Noncritical, reproducible data at lower levels of redundancy than Amazon S3's standard storage	ms	💲💲💲
Intelligent Tiering	ML moves objects between IA and Standard tiers based on changing access patterns	Unpredictable workloads, Changing access patterns, Lack of experience with storage optimization	ms	(small) Monitoring + auto-tiering fee
Glacier	Low cost cold storage.	DR Back-ups, Media Asset Archival	Expedited: 1-5 min, Standard: 3-5 hr, Bulk: 5 -12 hr	💲💲 + retrieval fee
Glacier Deep Archive	Lowest cost coldest storage	Magnetic tapes / VTL / Regulatory archiving	12-48 hours	💲 + retrieval fee

S3 Storage Tiers Reliability

	Standard	Reduced Redundancy Storage	Standard Infrequent Access	One - Infrequent Access	S3 Intelligent Tiering	Glacier
Durability	99.999999999%	99.99%	99.999999999%	99.999999999%	99.999999999%	99.999999999%
Availability	99.99%	99.99%	99.99%	99.5%	99.9%	No SLA
AZ	≥3	≥2	≥3	1	≥3	≥3
Concurrent facility fault tolerance	2	1	2	0	1	1

Glacier

Glacier classes moves objects to Amazon Glacier which is part of S3.
Concepts
- Archive: store data in (one or more files) with unique archive ID.
  - 📝The content of the archive is immutable, meaning that after an archive is created it cannot be updated.
- Vault: collection of archives where you upload data to.
Retrieval
- ❗ S3 object stored through Glacier can only be retrieved with S3 APIs as it'll map object path to archive id.
- Options
  
  Attribute Expedited Standard Bulk
  
  Data Subset of archives Any archive Large amounts
  
  Time 1 - 5 minutes 3 - 5 hours 5 -12 hours
  
  Pricing 💲💲💲💲 💲💲 💲
  - Expedited: Access subset of archives in 1-5 minutes.
    - On-demand: Accepted except for rare high load situations.
    - Provisioned: Buy capacity to ensure that your retrieval capacity is available.
  - Standard: Access any archive in 3-5 hours.
  - Bulk: Access large amounts in 5-12 hours
- ❗ Deep Archive has only one option for retrieval time around 12-48 hours.
❗ You cannot upload directly from console, possible only through AWS CLI.

Availability

Cross region replication (CCR)
- Copying is asynchronous replication
- On bucket level (entire bucket) or object level (based on prefix/tags)
- ❗📝 Must enable versioning (source and destination)
  - CRR is built on top of versioning functionality.
  - 🤗 As the replication is asynchronous, it needs 'non-changing' copy of data during replication
    - Versioning makes it easy so file can still be modified (upload, delete, etc.) while replication is in progress.
- Buckets must be in different AWS regions & can be in different accounts
- Requires IAM permissions to S3
- Delete markers & deleting individual versions / delete markers are not replicated.
- Use cases: compliance, lower latency access, replication across accounts
- Flow:
  1. Create a bucket, and one more new one as replica
  2. Original bucket -> Management -> Replication -> Add rule
    - Apply to entire bucket or "prefix or tags"
  3. Choose if objects encrypted with AWS KMS will also be encrypted
    - Other account must also have access to AWS KMS as well
  4. For replicated objects you can change
    - Storage class
    - Ownership to destination bucket owner
  5. Set IAM rule
    - Role gets created automatically that gives replication permissions to the original bucket
  - ❗ Does not replicate:
    - Existing files are not replicated, just new files but with same metadata and ACLs.
    - Replicas of other objects, lifecycle actions and configurations, objects without bucket owner permissions, SSE-C & SSE-KMS encrypted object (can be configured to replicate)...

S3 Websites

S3 can host static websites and have them accessible on the www.
- 📝 The website URL will be => bucket name + s3-website + region + amazonaws.com.
  - <bucket-name>.s3-website-<AWS-region>.amazonaws.com
  - <bucket-name>.s3-website.<AWS-region>.amazonaws.com
If you get 403 (Forbidden) error; make sure bucket policy allows public reads!
Supports redirects through a metadata tag.
In bucket -> Properties -> Static website hosting
- Select Use this bucket to host a website (optionally specify default index and error files)
- Another option is to redirect requests (to another place)
- S3 CORS (CORS=Cross Origin Resource Sharing)
  - 📝 If you request data from another S3 bucket, you need to enable CORS
  - Allows you to limit the number of websites that can request your files in S3 and this way limit costs.
  - E.g. bucket1 has image in bucket2. bucket2 must then has CORS headers for bucket1.
Prerequisites
- ❗ The bucket must have the same name as your domain or subdomain.
- (Optionally) Use CloudFront for SSL/TLS
- A registered domain name.
- Route 53 as DNS service for the domain to configure DNS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5.2. Storage - S3.md

5.2. Storage - S3.md

Amazon S3

S3 Pricing

S3 Storage Classes

Glacier

Availability

S3 Websites

Attribute	Expedited	Standard	Bulk
Data	Subset of archives	Any archive	Large amounts
Time	1 - 5 minutes	3 - 5 hours	5 -12 hours
Pricing	💲💲💲💲	💲💲	💲

Files

5.2. Storage - S3.md

Latest commit

History

5.2. Storage - S3.md

File metadata and controls

Amazon S3

S3 Pricing

S3 Storage Classes

Glacier

Availability

S3 Websites