Please enable JavaScript.
Coggle requires JavaScript to display documents.
S3 (S3 Basics, Hosting a static Website using S3, S3 Object Lock,…
S3
S3 Basics
-
-
-
Key-Value Store
Value
(The data itself, which is made of a sequence of bytes)
-
-
Metadata
(Data about the date you are storing, e.g. content-type, last-modified, owner, etc.)
Safe place to store files
(data is spread across multiple devices and facilities to ensure Availability and Durability)
Availability
99,95% - 99,99% depending on S3 tier
Durability
Standard S3 is 99,9999999999% durability for data stored in S3
-
Lifecycle Management
(define rules to automatically transition objects to a cheaper storage tier or delete objects that are no longer needed)
Versioning
(if enabled, all versions of an object are stored and can be retrieved, including deleted objects)
The Amazon S3 notification feature enables you to send notifications when certain events happen in your bucket
With the Requester Pays buckets the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data. When you enable Requester Pays on a bucket, anonymous access to that bucket will not be allowed anymore
CloudTrail logs provide a record of actions taken by a user, role, or an AWS service in Amazon S3
- CloudTrail does not deliver logs for requests that fail authentication (invalid credential)
- It does include logs for requests in which authorization fails (Access Denied) and requests that are made by anonymous users
- Recommended for bucket-level logging
- Differentiators:
- Forward to other services (EB, CW Log)
- more than one destination
- Subset of objects (prefix)
- Cross-account delivery
- Integrity Validation
- Fields for Object Lock parameters, Amazon S3 Select properties for log records
S3 Server Access Logs provide detailed records for the requests that are made to an S3 bucket for both Object operations and Bucket operations. Logs stored in a user defined S3 bucket. Useful in security and access audits. Also help you learn about your customer base and understand your Amazon S3 bill. Recommended for object-level logging.
Differentiators:
- Fields for Object Size, Total Time, Turn-Around Time, and HTTP Referer for log records
- Lifecycle transitions, expirations, restores
- Logging of keys in a batch delete operation
- Authentication failures
Checking object integrity - S3 uses checksum values to verify the integrity of data that you upload to or download. You can select from one of several checksum algorithms to use when uploading or copying your data. S3 computes checksum and store it as metadata
You can delete a bucket that contains objects using the AWS CLI only if:
- The bucket does not have versioning enabled
- The bucket is completely empty of objects and versioned objects
- All requests to a bucket are either authenticated or unauthenticated
- Authenticated requests must include a signature value that authenticates the request sender
- Unauthenticated requests do not, all unauthenticated requests are made by the anonymous user
- If an object is uploaded to a bucket through an unauthenticated request, the anonymous user owns the object
-
S3 Object Lock
S3 Object Lock store object using the WORM model = Write Once Read Many. Prevent objects from being deleted or modified for a fixed amount of time or indefinetly
Use S3 Object Lock to meet regulatory needs or add extra layer of protection against object change and deletion
Retention Modes
Governance Mode
Users can't overwrite or delete an object version or alter lock setting unless they have special permissions.
You can still grant some users permission to alter the retention setting or delete the object if necessary.
Compliance Mode
A protected object version can't be overwritten or deleted by any user including the AWS account root user.
Object retention mode can't be changed and its retention period can't be shortened.
Object can't be overwritten or deleted for the duration of the retention period.
Retention Period
Protects an object version for a fixed amount of time. S3 add a timestamp to object version's metadata to set when the retention period expires.
Once the retention period expires, the object version can be overwritten or delted unless you also placed a legal hold on the object version.
Legal Hold
It works along with S3 Object Lock. Like a retention period, a legal hold prevents an object version from being overwritten or deleted.
Legal hold doesn't have an associated retention period and remains in effect until removed. Legal holds can be placed and removed by any user with s3:PutObjectLegalHold permission
Glacier Vault Lock Policy
Enforce compliance controls for individual S3 Glacier vaults with a vault lock policyYou can specify controls (e.g. WORM) in a vault lock policy and lock the policy from future edits. Once locked the policy can no longer be edited and the object cannot be deleted or modifiedLocking a vault takes two steps:
- attach a Vault Lock policy to your vault which returns a lock ID (in-progress state)
- use the lock ID to complete the lock process within 24 hours. If the Vault Lock doesn't work as expected, you can stop the process and restart
It works only for versioned buckets.
Can be applied to individual object (version) or across the whole bucket (default retention mode and period that applies to new objects placed in the bucket).
Changing a bucket's default retention period doesn't change the existing retention period for any objects in that bucket.
Performance
Limitation with KMS Request Rates
- Using SSE-KMS there is a built-in KMS limit. This is region specific. It is either 5500, 10000 or 30000 request per seconds
- Upload/download counts toward KMS quota
- You can request a quota increase
- upload -> call to GenerateDataKey to KMS API
- download -> call to Decrypt to KMS API
Uploads
Multi-Part Upload (aka Parallel Upload)
- Recommended for files over 100MB
- Required for files over 5GB
- Parallelize uploads (increase efficiency)
- How it works:
- Calculate the file's MD5 checksum value
- Prepare the data breaking data into reasonably sized pieces (only for low-level aws s3api)
- Move the Pieces performing multipart upload steps to move all data to your S3 bucket
- S3 Puts It Together letting S3 know the upload is completed and S3 puts the data back together
- Use specific Lifecycle policy to abort & delete incomplete Multi-part Uploads after X days
- Scenario:
- user/app initiate a multi-part upload
- multi-part is incomplete for some reasons
- S3 Lifecycle Policy deletes aborted/inconpleted multi-part uploads after X days
- use the low level AWS S3 CLI to list incomplete multipart uploads
S3 Prefixes
A prefix is just a folder (and subfolders) in a bucket
E.g. MyBucketName/Folder1/SubFolder1/MyFile.txt > /Folder1/Sunfolder1
The more prefix you have in your bucket the higher performance you will get. Spread your your reads across prefixes. E.g. using 2 prefixes you get 11000 request per second
Downloads
S3 Byte-Range Fetches
- Allows to fetch a byte-range from an object, transferring only the specified portion
- You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object
- Parallelize: achieve higher aggregate throughput versus a single whole-object request
- If there is a failure in the download, it's only for a specific byte range
- Typical sizes for byte-range requests are 8 MB or 16 MB
- If objects are PUT using a multipart upload, it’s a good practice to GET them in the same part sizes (or at least aligned to part boundaries) for best performance
Latency: first byte out of S3 in 100-200 ms
3500 PUT/COPY/POST/DELETE per request/s per prefix -> 3,500 requests/s to add data
5500 GET/HEAD request/s per prefix -> 5500 requests/s to retrieve data
Transfer Acceleration
It is a bucket-level feature that enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket by taking advantage of AWS Edge Locations. Allows for 50-500% speedup
Transfer Acceleration takes advantage of the globally distributed edge locations in Amazon CloudFront. As the data arrives at an edge location, the data is routed to Amazon S3 over an optimized network path
Use Cases:
- Your customers upload to a centralized bucket from all over the world
- You transfer GBs to TBs of data on a regular basis across continents
- You can't use all of your available bandwidth over the internet when uploading to Amazon S3
Key Components
- Use CloudFront Edge Locations to accelerate transfers to and from S3
- Edge Location instead of uploading directly to you S3 bucket, you use special URL to upload to and Edge Location nearby, which will then transfer to S3
- Distinct URL you will give a distinct URL to upload, e.g. example.s3-accelerate.amazonaws.com, You can continue to use the regular endpoint in addition to the accelerate endpoints (but without acceleration)
aws s3 cli
- Use high-level aws s3 commands (e.g. aws s3 cp automatically perform multipart uploading and downloading based on the file size) for multipart uploads and downloads
- Use low-level/API-level aws s3api commands (e.g. aws s3api create-multipart-upload) only when aws s3 command don't support a specific upload (e.g the multipart upload involves multiple servers, or you manually stop a multipart upload and resume it later)
- more here
Object Storage
-
-
-
-
-
A file is treated like a single object. If you need to make a change to a file, then a new copy needs to be created (vs. Block Storage that "split" a file in multiple blocks and if you need to make a change this reflects only to affected blocks)
Storage Classess
Standrad S3
High available and Durability
Data stored across multiple devices in multiple AZs (>= 3)
99,99% availability
99,999999999 (11 9's) durability
Default Storage Class
(suitable for most workloads: websites, content distribution, mobile and gaming, big data analytics ...)
-
-
S3 Intelligent Tiering
Automatically moves your data to the most cost-effective data tier based on how frequently you access your objects
-
-
- 30 Days -> Infrequent Access tier (automatic)
- 90 Days -> Archive Instant Access tier (automatic)
- 90 / 730 Days -> Archive Access tier (optional)
- 180 / 730 Days -> Deep Archive Access tier (optional)
If an object in the Infrequent Access tier or Archive Instant Access tier is accessed later, it is automatically moved back to the Frequent Access tier
No retrieval charges
-
Storage Tiers
- S3 Glacier Flexible Retrieval (formerly S3 Glacier)
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours) – free
- Restore requests at a rate of up to 1,000 request/s
- This is ideal for data that does not require instant retrieval, but requires flexibility to retrieve large set of data at no cost. E.g. Backup or DR for non critical systems
- Data accessed once or twice a year
- Use Case: Ideal for backup and disaster recovery use cases when large sets of data occasionally need to be retrieved in minutes, without concern for costs
- Minimum duration 90 days
- 99.99% availability
- Galcier Deep Archive
- Standard retrieval time is 12 hours
- Bulk retrieval time is 48 hours
- Restore requests at a rate of up to 1,000 request/s
- Cheapest storage class, designed for company that needs to retain data for7-10 years or more for regularity compliance (e.g. Finance)
- Data accessed less than once a year
- Minimum duration 180 days
- 99.99% availability
- Glacier Instant Retrieval
- Provides long-term data archiving with instant retrieval time for your data
- Data accessed once a quarter
- Minimum duration 90 days
- 99.9% availability
Common Features
-
-
-
-
You cannot assign a key name to the archives that you upload. S3 Glacier returns a response with the archived ID that is unique in the region in which the archive is stored
Provisioned Capacity
Pay a fixed upfront fee for a given month to ensure the availability of retrieval capacity for expedited retrievals from Amazon S3 Glacier vaults
If you require access to Expedited retrievals under all circumstances, you must purchase provisioned retrieval capacity
-
Securing Data
ACL - individual object or bucket
(Define which AWS accounts or groups are granted access and the type of access. You can attach S3 ACLs to individual objects within a bucket)
ACL Disabled (recommended)
(All objects in this bucket are owned by this account. Access to this bucket and its objects is specified using only policies)
ACL Enabled
(Objects in this bucket can be owned by other AWS accounts. Access to this bucket and its objects can be specified using ACLs.)
ACL
- You use ACLs to grant basic read/write permissions to other AWS accounts
- There are limits to managing permissions using ACLs:
- grant only to other AWS accounts; not to users in your account
- ACLs are suitable for specific scenarios:
- if a bucket owner allows other AWS accounts to upload objects, permissions to these objects can only be managed using object ACL by the AWS account that owns the object
Permissions
- When granted on a bucket
- READ: Allows grantee to list the objects in the bucket
- WRITE Allows grantee to create new objects in the bucket. For the bucket and object owners of existing objects, also allows deletions and overwrites of those objects
- READ_ACP Allows grantee to read the bucket ACL
- WRITE_ACPAllows grantee to write the ACL for the applicable bucket
- FULL_CONTROL Allows grantee the READ, WRITE, READ_ACP, and WRITE_ACP permissions on the bucket
- When granted on an object
- READ Allows grantee to read the object data and its metadata
- WRITE Not applicable
- READ_ACP Allows grantee to read the object ACL
- WRITE_ACP Allows grantee to write the ACL for the applicable object
- FULL_CONTROL Allows grantee the READ, READ_ACP, and WRITE_ACP permissions on the object
Bucket Policies - bucket wide examples here
(Specify what actions are allowed or denied, e.g. user Alice can PUT but not DELETE objects in the bucket)
Block Public Access
Buckets are private by default
(For new buckets it is private by default including objects. you have to allow public access on both the bucket and the objects in order to make the bucket public)
Encrypting S3 Objects
Type of Encryptions
Encryption at Rest - Server-Side Encryption
(You can set default encryption on a bucket to encrypt all new objects as they are stored in the bucket)
Encrypts only the object data, not the object metadata
SSE-S3:
- S3 Managed Keys (no log / no policy)
- use AES 256-bit
- each object is encrypted using its own unique key that is encrypted with a master key automatically rotated
SSE-KMS:
- AWS KMS keys, choose between:
- AWS managed keys (aws/kms) (log / no policy / no cost)
- customer managed keys (log / policy /cost)
- get separate permission for the use of an additional key -envelop key- used to encrypt your data's encryption key
- you also get audit trail of encryption key usage via CloudTrails
- objects made public can never be read (anonymous users cannot have access to your KMS Key)
SSE-C:
- Customer provided key (you fully administer your keys, useful in highly regulated environments)
- When upload an object S3 uses the encryption key that you provide to apply AES-256 encryption to your data. S3 then removes the encryption key from memory
- When you retrieve an object, you must provide the same encryption key as part of your request
- Amazon S3 first verifies that the encryption key that you provided matches, and then it decrypts the object before returning the object data to you
- HTTPS is mandatory for SSE-C API calls (to enforce HTTPS, use a Bucket Policy with aws:SecureTransport)
DSSE-KMS:
- dual-layer server-side encryption with keys stored in AWS KMS
- applies two layers of encryption to objects when they are uploaded on S3
- designed to meet NSA CNSSP 15 for FIPS compliance and Data-at-Rest Capability Package (DAR CP) Version 5.0 guidance for two layers of CNSA encryption
-
Encryption in Transit
SSL/TLS
HTTPS
To enforce HTTPS, use a Bucket Policy with aws:SecureTransport
-
-
Glacier Encryption
- data is AES-256 encrypted, key under AWS control
- you can use client-side encryption before uploading data
Lifecycle Management
- Rules that automates moving your objects between different storage tiers maximising cost effectiveness
- Min 30 days before moving to Standard-IA, One Zone -IA
Example of Lifecycle Management Rule:
- S3 Standard (e.g. Keep 30 Days after creation)
- S3 IA (e.g. After 30 Days after creation)
- Glacier (e.g. After 90 Days after creation)
- Delete (aka expire) (e.g. After 365 Days after creation)
Combine Lifecycle Management with Versioning
Move different versions of objects to different storage tiers
-
Constraints
- STANDARD -> STANDARD_IA or ONEZONE_IA or GLACIER
- STANDARD_IA -> ONEZONE_IA or GLACIER
- ONEZONE_IA -> GLACIER
- object size > 128KB
- for STANDARD_IA or ONEZONE_IA object stored >= 30 days in the current storage class
- noncurrent objects (in versioned buckets) >= 30 days noncurrent
Expiration actions—Define when objects expire with an Expiration Date. S3 deletes expired objects on your behalf
-
S3 Inventory
Key Concepts
Supported file formats:
- CSV
- Apache ORC (Optimized Row Columnar)
- Apache Parquet
-
-
It helps you understand how you are using your S3 storage. You decide which metadata to include in the report (e.g. object size, last modified, multipart upload, replication status, encryption status, ...)
Use Cases:
- Find objects in the bucket encrypted with SSE-KMS
- Find objects in the bucket successfully replicated and which have failed
- Which storage class is used for each object in the buckets
- In general you can report on anything that can be stored in the object metadata
How it works:
- Given a my-finance-data bucket
- Create a my-inventory-data that will hold the S3 inventory file
- Configure the S3 inventory file to be created and stored in the my-inventory-data bucket
- You can optionally have the S3 inventory file encrypted with SSE-S3 or SSE-KMS
How to create:
- Create a bucket to hold your S3 inventory file
- In the original bucket go to Management > Inventory configurations
- Select the Inventory scope (prefix, current versions vs all versions)
- Select the Destination bucket in this account vs. another account
- Set Frequency Daily vs Weekly
- Select file format CSV vs ORC vs Parquet
- Select Server Side Encryption
- (optional) Additional metadata fields - Choose the metadata that should be included for each listed object in the report
S3 Storage Lens
- Understand, analyze, and optimize storage across entire AWS Organization
- Discover anomalies, identify cost efficiencies, and apply data protection best practices across entire AWS Organization
- Aggregate data for Organization, specific accounts, regions, buckets, or prefixes
- Can be configured to export metrics daily to an S3 bucket (CSV, Parquet)
How it works
- Configure aggregation: Organization, Accounts, Regions, Buckets
- Aggregate & Analyze (Dashboard)
- Optimize: Summary Insight, Data Protections, Cost Efficiency
Dashboard
- Default & Custom dashboards
- Default dashboard can’t be deleted, but can be disabled
Metrics
- Free and Advanced metrics
- Free Metrics
- Automatically available for all customers
- Data is available for queries for 14 days
- Advanced Metrics and Recommendations
- Advanced Metrics – Activity, Advanced Cost Optimization, Advanced Data Protection, Status Code
- CloudWatch Publishing
- Prefix Aggregation
- Data is available for queries for 15 months
Versioning
Avantages
Cannot Be Disabled
- once enabled, versioning cannot be disabled, only suspended
- IAM Administrator account can suspend Versioning
Lifecycle Rules
- Can be integrated with lifecycle rules
Backup
- can be a great backup tool
Supports MFA delete
- owner must include two forms of authentication in requests to delete a version or change the versioning state of the bucket
- only the bucket owner can enable/suspend the MFA-Delete on the objects
All Versions
- all version of an object are in S3, this includes all writes and even if you delete an object. Protect objects from accidental overwrite and deletion
-
Delete
DELETE request adds a Deleted marker along with a version ID, but all the object versions are still there. Current Version = Delete Marker. The actual object becomes the previous version
To Restore an object you just delete the Deleted marker (need s3:DeleteObjectVersion). If the bucket has MFA delete-enabled, you must use the designated MFA to remove the delete marker
You can permanently delete a specific object version by providing the version ID in the DELETE request (need s3:DeleteObjectVersion)
You can still access all the previous versions of the object by specifying the required version ID in your GET request (need s3:GetObjectVersion)
S3 responds to requests for the object with a delete marker as the object was deleted. E.g. a GET request for the object returns an error
Simple DELETE from versioning suspended
more here
version = null ---> A Simple DELETE (no specific version ID) actually remove the object whose version ID = null. It inserts a delete marker in its place with a version ID = null (delete marker doesn't have content, so you lose the content of the null version when a delete marker replaces it.)
version ≠ null ---> A Simple DELETE for object doesn't have a null version, removes nothing and just inserts a delete marker for the object with version ID = null
Event Notification
Allows to receive notifications when certain events happen in your S3 bucket. To enable notifications:
- add a notification configuration that identifies the events that you want Amazon S3 to publish
- set the destinations where you want Amazon S3 to send the notifications (you can only add 1 SQS or SNS at a time for Amazon S3 events notification)
Possible destinations are:
- SNS Topic
- SQS queues
- Lambda function
EventBridge
- Amazon S3 can send events to Amazon EventBridge whenever certain events happen in your bucket
- Unlike the other destinations, you don't need to select which event types you want to deliver
- After EventBridge is enabled, all events below are sent to EventBridge
- All events available for EventBridge are delivered.
-
Select
-
-
- Use SQL statements to filter the contents of S3 object and retrieve only the subset of data that you need
- Reduce the amount of data S3 transfers, which reduces the cost and latency
- Works with:
- CSV, JSON, or Apache Parquet format
- Compressed objects
- Server-Side Encryption
S3 Access Point
Helps when you have large data sets on a bucket that are accessed by different users and/or applications
Each user can have a dedicated access point on a bucket and each of these access points has its own access point policy
Network origin
- VPC no internet access. Requests are made over a specified VPC only (cannot access from AWS Console) it requires VPC Endpoint
- Internet if you want users outside your VPC to have access
-
Analytics
-
Storage Lens
Basics
- Understand, analyze, and optimize storage across entire AWS Organization
- Discover anomalies, identify cost efficiencies, and apply data protection best practices across entire AWS Organization (30 days usage & activity metrics)
- Aggregate data for Organization, specific accounts, regions, buckets, or prefixes
- Default dashboard or create your own dashboards
- Can be configured to export metrics daily to an S3 bucket (CSV, Parquet)
- Anomalies, cost efficiencies and data protection
Default Dashboard
- Visualize summarized insights and trends for both free and advanced metrics
- Default dashboard shows Multi-Region and Multi-Account data
- Preconfigured by Amazon S3
- Can’t be deleted, but can be disabled
Metrics
- Free
- Automatically available for all customers
- Data is available for queries for 14 days
- Advanced
- Additional paid metrics and features
- CloudWatch Publishing
- Prefix Aggregation
- Data is available for queries for 15 months
Pricing
You pay for all bandwidth into and out of Amazon S3 except for:
- Data transferred in from the Internet
- Data transferred out to an Amazon EC2 instance, when the instance is in the same AWS Region as the S3 bucket (including to a different account in the same AWS region)
- Data transferred out to Amazon CloudFront