Please enable JavaScript.
Coggle requires JavaScript to display documents.
Storage - Coggle Diagram
Instance Store
- Temporary storage for running instances (EC2, Lambda, etc.)
- Data is lost when disk drive fails, instance stops, hibernates, terminates
- Data is still preserved if instance reboot
- Superb I/O since the storage is physically attached to the instance
Elastic Block Storage (EBS)
- Persistent network drive that can be attached to EC2 instances
- Must link to a AZ
- Only one instance at a time can attach EBS
- Can be resized and throughput can be maximized if EC2 instance is EBS optimized
Types
IO 1, IO 2 (SSD), IO 2 Block Express
- Highest performance or high throughput
- IO 1 and IO 2 can be used as boot volume
ST 1
- Lost cost HDD for frequent access / high throughput
SC 1
- Lowest cost HDD for less frequent access
GP2, GP3 (SSD)
- Price/performance balanced
- Can be used as boot volume
Encryption
- Once configured for a region, every new EBS will be encrypted
Snapshot
- Incremental: only backup blocks that are changed
- Recommended to detach the volume first, and avoid snapshot when in high usage
- Need 'pre-warming' to be fully restored
- DLM
Multi-attach
- Exception for IO 1 and IO 2: multiple instance can attach them at the same time
- Volume must be formatted to cluster-aware format
- Instances must manage concurrency issues
Elastic File System (EFS)
- Managed Network File System (NFS)
- Must link to a VPC
- Can be attached to multiple instances (if in a same region)
- High availability, scalability, and cost
- Auto scaling supported
- POSIX compliant and works with Linux AMI. But not working with Windows
modes
Performance (default)
- Maximum I/O
- High latency, high throughput, high parallelism
Throughput
- Burstable up to 100MB/s per TB assigned
- Once EFS is provisioned, it can set the throughput regardless of the storage size assigned
classes
storage
Standard
- frequently accessed files
Infrequent Access (EFS-IA)
- Higher cost for retrieval
- Lower cost to store
-
Access Point
- Path based permissions
- Enforce POSIX user/group permissions
Cross-region Replication
- Replicate objects in one EFS to the EFS in another AWS region
Simple Storage Service (S3)
- Serverless, unlimited object storage
- Charged only for usage
By Attributes
Purpose
G
Infrequent Access
Standard-IA
- Infrequent access but rapid retrieval if needed
One Zone-IA
- Same as Standard IA except has only single AZ, which is vulnerable to AZ falure
Archive
Glacier Instant Retrieval
- Rarely access but rapid retrieval if needed
Glacier Flexible Retrieval
- Access once or twice a year
Glacier Deep Archive
- Same as Glacier Flexible Retrieval, but lower cost with slower retrieval
Intelligent-Tiering
- Automatic data transfer to most cost-effective tier
- Effective on data with unknown usage pattern
Infrequent Access
- After 30 days without access
- Low latency
- High throughput
Archive Access
- Optionally enabled
- After 90 ~ 730 days (in IA) without access
- Retrieval can take 3 ~ 5 hours
Archive Instant Access
- Can be enabled for Archive Access
- Low latency
- High throughput
Deep Archive Access
- Optionally enabled
- After 90 ~ 730 days
- Retrieval can take up to 12 hours
Frequent Access
- Beginning of file lifecycle
- Low latency
- High throughput
-
Availability
G
-
-
-
-
Glacier Instant Retrieval
Glacier Flexible Retrieval
-
Charges
Minimum
Storage Duration
G
-
-
-
-
Glacier Instant Retrieval
Glacier Flexible Retrieval
-
Capacity
G
-
-
-
-
Glacier Instant Retrieval
Glacier Flexible Retrieval
-
Retrieval
G
-
-
-
-
Glacier Instant Retrieval
Glacier Flexible Retrieval
-
Latency
G
-
-
-
-
Glacier Instant Retrieval
Glacier Flexible Retrieval
-
utilities
S3 Replication Time Control (RTC)
- Configure when S3 will replicate over cross region or same region at the specific timing
Cross-Region Replication
- Specify the time to replicate S3 to another region
- Time period can be from 15 minutes to 24 hours
- No additional cost beyond basic data-transfer cost
Same-Region Replication
- Specify the time to replicate S3 to the same region
- Time period can be from 15 minutes to 24 hours
S3 Event Notification
- S3 has built-in event emitting capability
- Event can be forwarded to EventBridge, SNS, SQS, or Lambda
Amazon S3 Select
- Use ScanRange parameter to specify the range of bytes to query
Object Lock
- Upon creation (only), the S3 bucket can turn-on the Object Lock
- Any object written are locked and cannot be overwritten/deleted for retention period
- The bucket can be deleted if there is no objects written yet
modes
governance
- User cannot overwritte/delete objects
- the bucket owner may still overwrite/delete objects
compliance
- Any user, including the owner, cannot overwrite/delete objects
Prefix
- Every object in S3 bucket is grouped by their prefix
-
bucket/1/file
is in /1/
, and bucket/2/file
is in /1/
Throughput
- S3 has built-in scaling to maintain 100-200ms latency
- 3500 PUT/COPY/POST/DELETE per sec per prefix
- 5500 GET/HEAD per sec per prefix
upload
Transfer Acceleration
- Increase upload speed by first uploading the file into Edge Location
- Edge Location then deliver it to S3 Bucket using AWS private network
Multi-Part Upload
- Large files can be divided and uploaded to S3 in parallel
- If part of the file fails to deliver, simply re-upload, then S3 will construct the file
- If the upload task isn't finish within the set lifecycle policy, all fragmented data will be deleted
download
Byte-Range Fetches
- File can be requested by range of bytes and downloaded in parallel
- Can download only specific range without downloading all
S3 Access Points
- Unique hostname that connects to the file(s)
- The file(s) allowed to access can be set with Access Point Policy
filtering
- Use SQL-like expression to filter before retrieving data
-
-
Amazon FSx
- High performance, automatic capacity scaling
- Capacity does not scale down, however
FSx for Windows File Server
- Windows' shared drive system: SMB and NTFS
- Supports Active Directory, ACLs, DFS Namespaces and user quotas
- EC2 instance can mount it directly
FSx for Lustre (Linux & Cluster)
- Parallel distributed files system for large-scale computing
- ML, HPC, Video Processing, Financial Modeling, Electronic Design Automation, etc.
- Optionally, S3 can be used as the storage
Scratch File System
- Temporary storage with high throughput
- Lower cost
- Data is lost when file server fails
Persistent File System
- Long term storage with replication
- Corrupted data can be replaced upon failure
Lazy Data Loading
- Selectively load data from S3 for the data requested
- Once loaded, it stays in Lustre
-
Multi-AZ deployment
- Provision stand by file servers in different AZs
- If throughput is changed, FSx switches file server to the secondary server. Once the new primary server is ready, it replaces the secondary server
Storage Gateway
- Hybrid storage service that allow on-premise to scale storage seamlessly
- Gateway must be installed on virtual machine or a dedicated hardware of the on-premise
- Supports NFS, SMB, iSCSI, or iSCSI-VTL
- Caching support and optimized and secure
Gateways
S3 File Gateway
- File interface into S3
- Accessed via NFS/SMB to S3
- Sync changes via HTTPS
-
FSx File Gateway
- Low latency and efficient access to in-cloud FSx for Windows File Server
- Useful if on-premise must use native windows file system
Tape Gateway
- Cloud-backed virtual tape storage
- The actual backend is still S3
Volume Gateway
- Cloud-backed Internet Small Computer System Interface (iSCSI) that can be attached to on-premise servers
- The actual backend is still S3
- Cached Volume store recently accessed data so access to S3 can be minimized
- Asynchronous back up point-in-time snapshots to S3
Stored mode
- Data are stored on-premise
- Asynchronous back-ups are stored in S3
Cached Mode
- Frequently accessed data stored in on-premise
- Less frequently accessed data are stored in S3