Coggle requires JavaScript to display documents.
the ability to stretch and retract infrastructure based on demand allows to pay as you go typically used during a short period of time, hours or days
the ability to build infrastructure to meet demand over a long term (days, weeks, months)
Elasticity: configure AutoScaling Group Scalability: increase instance size, use Reserved instances
Elasticity: increase/decrease IOPS based on traffic spikes Scalability: unlimited storage (automatically scales)
Elasticity: can't scale on-demand Scalability: change instance type, add number of instances
Elasticity: autoscale up/down meet varying demand Scalability: change instance type, add number of instances
AWS Auto Scaling AWS ElastiCache [Caching] Aurora [Database] RDS [Database] DynamoDB (Stream / Global Tables) Elastic IPs [VPC] S3 Cross-Region Replication [S3] SQS [Decoupling] DR DLM
It is an AWS service which allows to configure plans to automatically scale your resources It is a service that allows to define Scaling Plans that are then implemented by: EC2 Auto Scaling Group Application Auto Scaling
EC2 Auto Scaling Group Application Auto Scaling
AWS Auto Scaling is a simplified option to scale multiple Amazon cloud services based on utilization targets. Amazon EC2 Auto Scaling Group focuses strictly on EC2 instances to enable developers to configure more detailed scaling behaviors AWS Auto Scaling uses Amazon EC2 Auto Scaling to adjust capacity of scalable resources to handle changes in traffic or workload
Allows to configure and manage scaling for your resources through a Scaling Plan Uses dynamic and predictive scaling to scale your resources Useful for scaling applications across some AWS services: EC2 Auto Scaling Groups - increase/decrease desired capacity ECS Services - increase/decrease desired Task count EC2 Spot Fleet - requests Aurora Replicas - increase/decrease read replicas DynamoDB tables and GSI global secondary indexes - increase provisioned write/read capacity
EC2 Auto Scaling Groups - increase/decrease desired capacity ECS Services - increase/decrease desired Task count EC2 Spot Fleet - requests Aurora Replicas - increase/decrease read replicas DynamoDB tables and GSI global secondary indexes - increase provisioned write/read capacity
Application Auto Scaling allows to automatically scale scalable resources for individual AWS services beyond Amazon EC2 Application Auto Scaling supports: Target tracking scaling (CW metric), Step scaling (Alarm), Scheduled scaling There is no a specific AWS Console for Application Auto Scaling, for the supported services you work in the specific service console. Console access is not available for all resources AWS Auto Scaling uses Application Auto Scaling to adjust capacity of scalable resources to handle changes in traffic or workload
Optimize for availability - keep the average CPU utilization of your Auto Scaling groups at 40% to provide high availability and ensure capacity to absorb spikes in demand Balance availability and cost - keep the average CPU utilization of your Auto Scaling groups at 50% to provide optimal availability and reduce costs Optimize for cost - keep the average CPU utilization of your Auto Scaling groups at 70% to ensure lower costs Custom - choose your own scaling metric, target value, and other settings
CloudFormation goes through CloudFormation Stacks and find scalable resources through existing CloudFormation templates EC2 Auto Scaling Groups select one or more existing EC2 Auto Scaling Groups to be included in your Scaling Plan. The Auto Scaling Plan can override some of the EC2 Auto Scaling Group settings Tagged Resources search for scalable resources using the tags applied to them
EC2: maintain an Auto Scaling Group through launching/terminating instances DynamoDB: enable tables or secondary indexes to increase/decrease read/write capacity ECS: adjust ECS Services and Tasks in response to load variations Aurora: automatically adjust the number of read replicas in the Aurora DB Cluster (e.g. target metric of avg number of connections)
Auto Scaling Group Not Found Auto Scaling Service Not Enabled in the Account (common in Account enrolled in AWS Organization or may have active SCP preventing you) Auto Scaling config Not Working Correctly
Invalid EBS device mapping Instance Type not compatible in AZ Attempting to attach an EBS block device to an instance-store AMI Check your instance is supported in your AZ
Associated Key pair doesn't exist Security Group doesn't exist
Troubleshoot EC2 Auto Scaling Groups in the EC2 console Monitor Auto Scaling Plans in AWS Auto Scaling
Fault Tolerance - should components of my app exists in other AZs in the case of failure? Availability - Should my app be spread across multiple AZs to meet traffic demand? Cost - Could I save money by limiting the app to a single AZ?
RDS ElastiCache Redshift Neptune DocumentDB
Create EBS and apply a Tag Create new lifecycle policy (policy types) EBS snapshot policy EBS-backed AMI policy Cross-account copy event policy Target resource types (use Tags fore selection) Volume (single volume) Instance (multi-volume) Schedule details Retention type Count Age Encrypt using KMS (default and CMK) Copy tags from source (Y/N) Snapshot archiving Fast snapshot restore Cross-Region copy Cross-account sharing
EBS snapshot policy EBS-backed AMI policy Cross-account copy event policy
Volume (single volume) Instance (multi-volume)
Count Age
Encryption using KMS (default and CMK) Cross Region copy Cross-account sharing Snapshot Archiving
Does not support instance store-backed AMIs Can’t be used to manage snapshots/AMIs created outside DLM
Default Policies (Account-level) meant to be a universal safety net for your account Acts as a "catch-all" targeting all EBS volumes or EC2 instances in a specific region and only creates backup if the resource doesn't a snapshot from another source (e.g. manual snapshot or custom policy) Provides only basic snapshotting feature Data retention 2÷14, default 7 Custom Policies (Tag-based) precise tool for specific workloads Supports Pre/Post scripts, Archiving, and Cross-account copy Age-based Retention 1 day ÷ 100 years Count-based Retention 1÷1000 snapshots
meant to be a universal safety net for your account Acts as a "catch-all" targeting all EBS volumes or EC2 instances in a specific region and only creates backup if the resource doesn't a snapshot from another source (e.g. manual snapshot or custom policy) Provides only basic snapshotting feature Data retention 2÷14, default 7
precise tool for specific workloads Supports Pre/Post scripts, Archiving, and Cross-account copy Age-based Retention 1 day ÷ 100 years Count-based Retention 1÷1000 snapshots
Aurora RDS EBS Storage Gateway
FSx EFS DynamoDB EC2
Create a new EBS volume, and then copy the data on your instance store volume to the EBS volume Back up the individual files stored on an Amazon EBS volume or S3 bucket
Readiness Checks (The "Audit") - Instead of just checking if a server is "up," ARC audits your entire environment to ensure your standby site is actually capable of handling traffic Continuous Monitoring: scans resource quotas (e.g., service limits), capacity (EC2 instance counts), and network configurations Drift Detection: if you scale up your primary region but forget to scale your standby, ARC will mark the standby as "Not Ready" Resource Sets: groups related resources (e.g. ALBs or Auto Scaling Groups) across regions to check them as a single item Routing Controls (The "Kill Switch") - manual "on/off" switches that let you redirect traffic during an outage without waiting for DNS propagation or standard health check timeouts Extreme Reliability: The "Control Plane" for these switches is hosted on a cluster distributed across five AWS Regions, ensuring you can still flip the switch even if several AWS regions are entirely offline Manual Override: Unlike standard DNS failover, which is fully automated, Routing Controls give you a human-in-the-loop "big red button" to trigger failover Reroute users between AZs/Regions via DNS & health-checks Safety Rules (The "Guardrails")- Automated failovers can sometimes cause more harm than good (e.g., "flapping" or failing over to an empty region). Safety rules prevent this Pre-conditions: You can create rules like "Don't allow failover if the target region is marked Not Ready" or "Never turn off both regions at the same time." Quorum Rules: You can require that at least 2 out of 3 zones be healthy before any traffic shift is allowed Zonal Shift & Zonal Autoshift Zonal Shift: Allows you to manually move traffic away from a specific Availability Zone (AZ) if you notice localized issues (like a bad deployment or a cooling failure in one data center) Zonal Autoshift: AWS automatically shifts your traffic away from an AZ if AWS telemetry detects a potential failure (like a power or networking event) affecting that specific zone. Region Switch - orchestrate full Region-level failover
Continuous Monitoring: scans resource quotas (e.g., service limits), capacity (EC2 instance counts), and network configurations Drift Detection: if you scale up your primary region but forget to scale your standby, ARC will mark the standby as "Not Ready" Resource Sets: groups related resources (e.g. ALBs or Auto Scaling Groups) across regions to check them as a single item
Extreme Reliability: The "Control Plane" for these switches is hosted on a cluster distributed across five AWS Regions, ensuring you can still flip the switch even if several AWS regions are entirely offline Manual Override: Unlike standard DNS failover, which is fully automated, Routing Controls give you a human-in-the-loop "big red button" to trigger failover Reroute users between AZs/Regions via DNS & health-checks
Pre-conditions: You can create rules like "Don't allow failover if the target region is marked Not Ready" or "Never turn off both regions at the same time." Quorum Rules: You can require that at least 2 out of 3 zones be healthy before any traffic shift is allowed
Zonal Shift: Allows you to manually move traffic away from a specific Availability Zone (AZ) if you notice localized issues (like a bad deployment or a cooling failure in one data center) Zonal Autoshift: AWS automatically shifts your traffic away from an AZ if AWS telemetry detects a potential failure (like a power or networking event) affecting that specific zone.