S3 Object Lock
S3 Object lock
⚡
Can use to store objects using a write once, read many (WORM) model
it can help you prevent objects from being deleted or modified for a fixed amount of time or indefinitely
⚡
can use to meet regulator requirements that require WORM storage or add an extra layer of protection against object changes and deletion
⚡
Governance Mode
⚡
⭐ Users can't overwrite or delete an object version or alter its lock settings unless they have special permissions
⚡
protect objects against being deleted by most users, but you can still grant some users permission to alter the retention settings ore delete the object if necessary
⚡
Compliance Mode
⚡
⭐ a protected object version can't be overwritten or delete by any user, including the root uses in your AWS account.
⚡
When an object is locked in compliance mode, its retention mode can't be changed and its retention period can't be shortened.
⚡
Compliance mode ensures an object version can't be overwritten or deleted for the duration of retention period
⚡
⭐ Retention Periods
⚡
protects an object version for a fixed amount of time. When you place a retention period on an object version, Amazon S3 stores a timestamp in the object version's metadata to indicate when the retention period expires
⚡
After the retention period expires, the object version can be overwritten or deleted unless you also placed a legal hold on the object version
⚡
⭐ Legal Holds
⚡
S3 Object Lock also enables you to place a legal hold on a object version
⚡
Like a retention period, a legal hold prevents an object version from being overwritten or deleted
⚡
However, a legal hold doesn't have an associated retention period and remains in effect until removed. Legal holds can be freely placed and removed by any user who has the s3:PutObjectLegalHold permission
Glacier Vault Lock
⚡
Allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a Vault Lock policy
⚡
You can specify controls, such as WORM, in a Vault Lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed
Exam tips
⚡
Use S3 Object Lock to store objects using a write once, read many (WORM) model
⚡
Object locks can be individual objects or applied across the bucket as a whole
⚡
Object lock come in two models
Governance mode
compliance mode
S3 Performance
What is a prefix with S3
ex: mybucketname/folder1/subfolder/myfile.jpg
prefix: /folder1/subfolder1
S3 Performance
⚡
S3 has extremely low latency. You can get the first byte of S3 within 100 - 200 ms
⚡
You can also achieve a high number of requests
3,500 PUT/COPY/POST/DELETE
5,500 GET/HEAD request per second per prefix
KMS Request Rates
⚡
Limitations
⚡
If you are using SSE-KMS to encrypt your objects in S3, you must keep in mind the KMS limits
⚡
when you upload a file, you will call GenerateDataKey in the KMS API
⚡
When you download a file, you will call Decrypt in the KMS API
⚡
S3 Limitation when using KMS
⚡
Uploading/ downloading will count toward the KMS quota
⚡
Region-specific, however, it's either 5,500, 100,000, or 30,000 request per second
⚡
Currently, you can not request a quota increase for KMS
⚡
Multipart Uploads
Recommended for files over 100 MB
Required for files over 5 GB
Parallelize uploads (increases efficiency)
⚡
Downloads
S3 Byte-Range Fetches
Parallelize downloads by specifying byte ranges
If there's failure in the download, it's only for a specific byte range
Can be used speed up downloads
Can be used to just download partial amounts of the file (e.g., header information)
Exam tips
⚡
What is a prefix with S3
ex: mybucketname/folder1/subfolder/myfile.jpg
prefix: /folder1/subfolder1
⚡
You can also achieve a high number of requests
3,500 PUT/COPY/POST/DELETE
5,500 GET/HEAD request per second per prefix
⚡
You can get better performance by spreading your reads across different prefixes
EX: if you are using two prefixes, you can achieve 11,000 request per second
⚡
If using SSE-KMS to encrypt your objects in S3, you must keep in mind the KMS limits
⚡
Uploading/downloading will count to toward the KMS quota
⚡
Region specific, however it's either 5,500, 10,00 or 30,000 requests per second
⚡
Currently, you cannot request a quota increase for KMS
⚡
Use multipart uploads to increase performance when uploading files to S3
should be used for any files over 100 MB and must be used for any file over 5 GB
Use S3 byte-range fetches to increase performance when downloading files to S3
S3 Select
what is ?
enables application to retrieve only a subset of data from an object by using simple SQL expression
By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increase in many cases, you can get as much as a 400% improvement
Glacier Select
⚡
Some companies in highly regulated industries e.g., financial services, healthcare, and others
write data directly to Amazon Glacier to satisfy compliance needs like SEC rule 17a-4 or HIPAA
⚡
Many S3 users have lifecycle polices designed to save on storage costs by moving their data into Glacier when they no longer need to access it on a regular basis
⚡
🏅 Glacier Select allows you to run SQL queries against Glacier directly
Exam Tips
Remember that S3 Select is used to retrieve only a subset of data from an object by using simple SQL expressions
Get data by Rows or Columns using simple SQL expressions
Save money on data transfer and increase speed
AWS Organizations & Consolidated Billing
What is AWS Organizations
AWS Organizations is an account management service that enables you to consolidate multiple AWS accounts into an organization that you create and centrally manage
Advantages of Consolidated Billing
One bill per AWS account
Very easy to track charges and allocate costs
Volume pricing discount
⚡
Some best practice with AWS Organizations
Always enable multi-factor authentication on root account
Always use a strong and complex password on root account
Paying account should be used for billing purpose only. Do not deploy resources into the paying account
Enable/Disable AWS services using Service Control Polices (SCP) either on OU or on individual accounts
S3 - Cross Account Access
3 different ways to share S3 buckets across accounts
Using Bucket Policies $ IAM ( applies across the entire bucket). Programmatic Access Only
Using Bucket ACLs & IAM ( individual objects). Programmatic Access Only
Cross-account IAM Roles. Programmatic AND Console access
Cross Region Replication
Versioning must be enabled on both the source and destination buckets
Files in an existing bucket are not replicated automatically
All subsequent updated files will be replicated automatically
Delete markers are not replicated
Deleting individual versions or delete markers will not be replicated
Understand what Cross Region Replication is at a high level
S3 Transfer Acceleration
what is ?
Utilises the CloudFront Edge Network to accelerate your uploads to S3
Instead of uploading directly to your S3 bucket, you can use a distinct URL to upload directly to an edge location which will then transfer that file to S3. You will get a distinct URL to upload to : abc.s3-accelerate.amazonaws.com
AWS DataSync
Used to move large amounts of data from on-premises to AWS
Used with NFS and SMB compatible file systems
Replication can be done hourly, daily, or weekly
Install the DataSync agent to start the replication
Can be used to replicate EFS to EFS
CloudFront
What is ?
A content delivery network (CDN) is a system of distributed servers (network) that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the webpage, and a content delivery server
Key Terminology
⚡
Edge Location
This is location where content will be cached. This is separate to an AWS Region/AZ
⚡
Origin
This is the origin of all the files that the CDN will distribute. This can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer or Route 53
⚡
Distribution
This is the name given the CDN which consists of a collection of Edge Locations
can be used to deliver you entire website, including dynamic, static, streaming, and interactive content using a global network of edge locations. Requests for your content are automatically routed to the nearest edge location, so content is delivered with the best possible performance
⚡
Web Distribution
Typically used for Websites
⚡
RTMP
Used for Media Streaming
⚡
⚡
Edge locations are not just READ only
you can write to them too
⚡
Objects are cached for the life of the TTL (Time to Live)
⚡
You can clear cached objects, but you will be charged
CloudFront Signed URLs and Cookies
CloudFront Signed URLs
Signed Cookies
A signed cookie is for multiple files
1 cookies = multiple files
Policy
can include
URL expiration
IP ranges
Trusted signers (which AWS accounts can create signed URLs
⚡
A signed URL is for individual files
1 file = 1 URL
⚡
Can have different origins. Does not have to be EC2
⚡
Key-pair is account wide and managed by the root user
⚡
Can utilize caching features
⚡
Can filter by date, path, IP, address, expiration, etc.
S3 Signed URL
Issues a request as the IAM user who creates the presigned URL
Limited life time
Exam tips
Use signed URLs/cookies when you want to secure content so that only the people you authorize are able to access it
A signed URL is for individual files. 1file = 1 URL
A signed cookie is for multiple files. 1 cookie = multiple files
If you origin is EC2, then use CloudFront
Snowball
Import to S3
Export from S3
S3 Versioning
Stores all versions of an object (including all writes and even if you delete an object)
Great backup tool
Once enabled, Versioning cannot be disabled, only suspended
Integrates with Lifecycle runs
Versioning's MFA Delete capability, which uses multi factor authentication, can be used to provide an additional layer of security
Lifecycle Managment
Automates moving your objects between the different storage tiers
Can be used in conjunction with versioning
Can be applied to current versions and previous versions
S3 Security & Encryption
⚡
Can setup access control to your buckets using
Bucket Policies
Access Control Lists
⚡
by default
all newly created buckets are PRIVATE
⚡
S3 bucket
can be configured to create access logs which log all request made to the S3 bucket
This can be sent to another bucket and even another bucket in another account
The Basics
⚡
Encryption At Rest (Server Side) is achieved by
⚡
S3 Managed Keys - SSE - S3
⚡
AWS Key Management Service, Managed Keys - SSE - KMS
⚡
Server Side Encryption With Customer Provided Keys - SSE - C
⚡
Client Side Encryption
⚡
Encryption In Transit is achieved by
SSL/TLS
S3 Pricing Tiers
⚡
What are the different Tiers
S3 Standard
S3 - IA
S3 One Zone - IA
S3 Intelligent Tiering
S3 Glacier
S3 Glacier Deep Archive
⚡
What makes up the cost of S3
What makes up the cost of S3
⚡
Storage
⚡
Request and Data Retrievals
⚡
Data Transfer
⚡
Management & Replication
Storage Gateway
What is ?
is as service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage
AWS Storage Gateways software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacenter. Storage Gateway supports either VMware ESXi or Microsoft Heyper-V. Once you've installed your gateway and associated it with your AWS account through the activation process, you can use the AWS Management Console to create the storage gateway option that is right for you
Types of Storage Gateway
💫
⚡
Volume Gateway (iSCSI)
File Gateway (NFS & SMB)
⚡
Tape Gateway (VTL)
⚡
Files are stored as objects in your S3 buckets, accessed through a Network File System (NFS) mount point
⚡
Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object associated with the file
⚡
Once objects are transferred to S3 the can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication apply directly to objects stored in you bucket
🇿🇦
what is ?
⚡
The volume interface presents you applications with disk volumes using iSCSI block protocol
⚡
Data written to these volumes can be asynchronously backed up as point-in-time snapshots of you volumes, and stored in the cloud as Amazon EBS snapshots
⚡
Snapshots are incremental backups that capture only changed blocks. All snapshots storage is also compressed to minimize your storage charges
types
Stored Volumes
Cached Volumes
⚡
let you store your primary data locally, while asynchronously backing up that data to AWS.
⚡
Stored volumes provide your on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups.
⚡
You can create storage volumes and mount them as iSCSI devices from your on-premises application servers
⚡
Data written to your stored volumes is stored on your on-premises storage hardware.
⚡
This data is asynchronously backed up to Amazon Simple Storage Service (Amazon S3) in the form of Amazon Elastic Block Store (Amazon EBS) snapshots.
⚡
1GB - 16TB in size for Stored Volumes
⚡
Let you use Amazon S3 as you primary data storage while retaining frequently accessed data locally in your storage gateway
⚡
Cached volumes minimize the need to scale you on-premises storage infrastructure, while still providing your application with low-latency access to their frequently accessed data.
⚡
You can create storage volumes up to 32 TB in size and attach to them as iSCSI devices from your on-premises application servers.
⚡
Your gateway stores data that you write to these volumes in Amazon S3 and retains recently read data in your on-premises storage gateway's cache and upload buffer storage.
⚡
1 GB - 32TB in size for Cached Volumes
What is
⚡
Offers a durable, cost-effective solution to archive your data in the AWS cloud.
⚡
The VTL interface it provides lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
⚡
Each tape gateway is preconfigured with a media changer and tape drives, which are available to your existing client backup application as iSCSI derives.
⚡
You add tape cartridges as you need to archive your data. Supported by NetBackup, Backup Exec, Veeam ...
Exam tips
⚡
File Gateway
For flat files, stored directly on S3
⚡
Volume Gateway
⚡
Stored Volumes
Entire Dataset is stored on site and is asynchronously backed up to S3
⚡
Cached Volumes
Entire Dataset is stored on S3 and the most frequently accessed data is cached on site
⚡
Gateway Virtual Tape Library
Athena vs Macie
Athena ?
⚡
Interactive query service which enables you to analyses and query data located in S3 using standard SQL
⚡
Serverless, nothing to provision, pay per query / per TB scanned
⚡
No need to set up complex Extract/Transform/Load (ETL) process
⚡
Works directly with data stored in S3
⚡
Can Athena be used for ?
⚡
Can bed used to query log files stored in S3, e.g. ELB logs, S3 access logs etc
⚡
Generate business reports on data stored in S3
⚡
Analyse AWS cost and usage reports
⚡
Run queries con click-stream data
Macie ?
⚡
What is PII
⚡
Personal data used to establish an individual's identity
⚡
This data cloud be exploited by criminals, used in identity theft and financial fraud
⚡
Home address, email address SSN
⚡
Passport number, deriver's license number
⚡
D.O.B, phone number, bank account, credit card number
⚡
What is Macie
⚡
Security service which uses Machine Learning and NLP (Natural Language Processing) to discover, classify and protect sensitive data stored in S3
⚡
Uses AI to recognize if your S3 objects contain sensitive data such as PII
⚡
Dashboards, reporting and alerts
⚡
Works directly with data stored in S3
⚡
Can also analyze CloudTrail logs
⚡
Great for PCI-DSS and preventing ID theft
S3 & IAM Summary
IAM
⚡
consists
⚡
Users
⚡
Groups
⚡
Roles
⚡
Policies
⚡
So far
IAM is universal. It does not apply to regions at this time
The *"root account" is simply the account created when first setup your AWS account. It has complete Admin access
New Users have NO permissions when first created
New Users are assigned Access Key ID & Secret Access Keys when first created
These are not the same as a password. You cannot use the Access key ID & Secret Access Key to Login in to the console. You can use this to access AWS via the APIs and Command line, however
You only get to view these once. If you lose them, you have to regenerate them. So, save them in a secure location.
Always setup Multifactor Authentication on your root account
You can create and customize your own password rotation policies
S3
⚡
S3 is Object-based: i.e. allows you to upload files
⚡
Files can be from 0 bytes to 5TB
⚡
There is unlimited storage
⚡
Files are stored in Buckets
⚡
S3 is a universal namespace. That is, names must be unique globally
⚡
Not suitable to installing an operating system on
⚡
Successful uploads will generate a HTTPs 200 status code
⚡
By default, all newly created buckets are *PRIVATE. You can setup access control to your buckets using
⚡
☣
⚡
S3 buckets can be configured to create access logs which log all requests made to the S3 bucket. This can be sent to another bucket and even another bucket in another account
Bucket Polices
⚡
Access Control Lists
⚡
The Key fundamentals of S3 are
☣
Key (This is simply the name of the object)
⚡
Value ( This is simply the data and is made up of a sequence of bytes
⚡
Version ID (Important for versioning)
⚡
Metadata (Data about data you are storing)
🏮
Sub resources
⚡
Access Control Lists
⚡
Torrent
⚡
.....
S3・101
⚡
guarantees
⚡
Built for 99.99% availability for the S3 platform
⚡
Amazon Guarantee 99% availability
⚡
Amazon guarantees 11x9s durability for S3 information
⚡
Features
⚡
⭐Tiered Storage Available
⚡
Lifecycle Management
⚡
Versioning
⚡
Encryption
⚡
MFA Delete
⚡
Secure your data
⚡
Access Control List
⚡
Bucket Policies
⚡
Data consistency
⚡
Read after Write consistency for PUTs of new Object
⚡
Eventual Consistency for overwrite PUTS and DELETE
⚡
Charged for S3 ways
Storage
Requests
Storage Management Pricing
Data Transfer Pricing
Transfer Acceleration
Cross Region Replication Pricing
⚡
S3 Transfer Acceleration
⚡
enables
fast
easy
secure transfers of files over long distance between your end users and an S3 bucker
⚡
takes advantage of Amazon Cloud Front's globally distributed edge location
Data arrives at an edge location, data is routed to Amazon S3 over an optimized network path
⚡
Restricting Bucket Access
⚡
Bucket Policies
Applies across the whole bucket
⚡
Object Polices
Applies to individual files
⚡
IAM Policies to Users&Groups
Applies to Users & Groups
⚡
S3 Storage Classess
⚡
S3 Glacier Deep Archive
is Amazon S3's lowest-cost storage class where a retrieval time of 12 hours is acceptable
⚡
S3 Glacier
can reliably store any amount for data at costs that are competitive with or cheaper than on-premises configurable from minutes to hours
a secure, durable and low-cost storage class for data arching
⚡
S3 Intelligent Tiering
Designed to optimize cost by automatically moving data to the most cost-effective access tier
without performance impact or operation overhead
⚡
S3 One Zone - IA
for where want a lower-cost option for infrequently accessed data
not require the multiple Availability Zone data resilience
⚡
S3 - IA
⚡
Lower fee than S3
but you are charged a retrieval fee
⚡
for data that is accessed less frequently, but requires rapid access when needed
⚡
S3 Standard
⚡
is designed to sustain the loss of 2 facilities concurrently
stored redundantly across multiple devices in multiple facilites
99.99999999999% durability
⚡
99.99% availability
basics
⚡
Not suitable to install an operating system on
⚡
buckets name
⚡
S3 is an universal namespaces
names must be unique globally
Unlimited storage
Object based
⚡
allow you to upload files
⚡
file can be
5 TB
0 bytes
⚡
Files are stored in Buckets
⚡
when upload file
sucessfull
200 code
⚡
consist
⚡
Sub resources
Torrent
Access Control List
⚡
Metadata
Data about data you are storing
⚡
Version Id
important for versioning
⚡
Value
made up of sequence of bytes
simply the data
⚡
Key
name of the object
⚡
Think of objects just as files
What is S3
The data is spread across multiple devices and facilities
Object based storage
safe place to store files
IAM
Key Features
Centralised control of your AWS account
Shared Access to your AWS account
Granular Permissions
Identity Federation (Active Directory, Facebook, Linked ...)
Multifactor Authentication
Provide temporary access for users/devices and Service where necessary
Allows you to set up your own password rotation policy
Integrates with many different AWS services
Supports PCI DSS Compliance
Key Terminology
Users
End users
people
employees of organization
Groups
A collection of users
User in group will inherit the permissions of the group
Policies
made up of documents
called Policy documents
format called JSON
give permissions as to what a User/Group/Role is able to do
Roles
Create roles
assign them to AWS resources