Please enable JavaScript.

Coggle requires JavaScript to display documents.

Containers - Coggle Diagram

- - - - Fargate Only (Common workloads)
        
        Fargate and Managed InstancesManaged instances (Advanced workloads)
        
        ECS manage patching and scaling
        
        Configurability about the types of instances
        
        Key settings:
        
        Instance profile: IAM Role required to access AWS services (ECS, ECR, CloudWatch Log, SSM)
        
        Infrastructure role: IAM Role to manage the Amazon ECS Managed Instances lifecycle (launch/terminate instance, apply patches, etc.)
        
        Instance Selection:
        
        ECS Default (Recommended): ECS choose instance type based upon ECS Task definition and ECS Service requirements
        
        Use custom: you specify instance attributes (vCPU, RAM, etc.) or exact instance types
        
        Fargate and Self-managed instances
        
        you have full control over the instances
        
        you patch and scale instances
        
        Create / Use existing ASG (with on-demand or spot instances)
        
        EC2 instance role
      - Scaling
        
        ASG
        
        You manage standard ASG (not container aware)
        
        EC2 only
        
        ECS Cluster Capacity Provider
        
        Controls underlying ASG
        
        Uses specialized metric CapacityProviderReservation make scaling decision
        
        Prevent ASG terminate an instance that is still running active ECS task during a scale-in event
        
        Types:
        
        FARGATE
        
        FARGATE_SPOT (AWS can terminate your tasks with a 2-minute warning if they need the capacity back)
        
        ASG
      - Storage
        
        Host-Centric (legacy):
        
        attach EBS volume / mount EFS to EC2 instance, and then use "Bind Mounts" in your task to point to a specific folder on that host
        
        Problem:
        
        Your task became "locked" to that specific EBS/instance. If the instance died, the task couldn't move easily
        
        If you had 10 different EFS filesystems for 10 different apps, your EC2 hosts became cluttered with mount points
        
        Task-Centric:
        
        You define the volume inside the Task Definition
        
        When ECS schedules the task, it handles the "heavy lifting" of attaching and mounting the storage to whichever EC2 instance happens to be running the task
  - - - Turned Off
        
        Container Insights
        
        Aggregated metrics at Cluster and Service level
        
        You can run deep dive analysis with Logs Insights analytics.
        
        Container Insights with enhanced observability
        
        Detailed health and performance metrics at Task and Container level
        
        Includes Container Insight features
      - ECS Exec encryption and logging
        
        ECS Exec allows to run commands in or get a shell to a container (both EC2 and Fargate)
        
        You can log ECS Exec (commands and outputs)
        
        Send log to CloudWatch Log or S3
        
        You can specify a KMS key to encrypt the data between the local client and the container
    - - CW Agent Deployment Models
        
        Sidecar (only option for Fargate. Launch)
        
        Fargate/EC2
        
        Every task has its own agent container. Best for task-level metrics and Fargate workloads
        
        Daemon (best for EC2 Launch)
        
        EC2 Only
        
        One agent per EC2 instance. It collects metrics from all tasks on that host.
        
        Most cost-effective for large EC2 clusters.
      - CW Agent Deployment
        Sidecar:
        
        IAM Role:
        
        Task Role: Needs CloudWatchAgentServerPolicy to allow the agent to send metrics and logs while running
        
        Task Execution Role: Needs CloudWatchAgentServerPolicy and ssm:GetParameters to pull agent config from SSM Parameter Store
        
        Store Agent config SSM Parameter Store
        
        Update the Task Definition
        Add the CloudWatch agent as a second container in your Task Definition
        
        Daemon:
        
        IAM Role:
        
        Task Role: Attach the managed policy CloudWatchAgentServerPolicy. This allows the agent to push data to CloudWatch
        
        Task Execution Role: Needs CloudWatchAgentServerPolicy and ssm:GetParameters to pull agent config from SSM Parameter Store
        
        Store Agent config SSM Parameter Store
        
        Create the "Daemon" Task Definition
        
        Deploy as an ECS Service with "Daemon" Strategy (instead of Replica)
  - - - Share data between multiple containers in the same Task Definition
        
        Bind Mount is a way to map a specific directory from the "host" into your container
        
        EC2 Launch:
        
        maps a path in EC2 to a path in container
        
        persist on EC2 after Task end
        
        Use cases: shared storage among containers, send logs, fast local caching
        
        Fargate Launch:
        
        Temporary shared "scratch pad" that exists only for the life of the task
        
        Sharing files between containers in the same task
        
        Leverages ephemeral storage 20 ÷ 200GB
    - - Task Placement Process
        1) Identify which instances that satisfy the CPU, memory, and port requirements
        2) Identify which instances that satisfy the Task Placement Constraints
        3) Identify which instances that satisfy the Task Placement Strategies
        4) Select the instances
        Task Placement Constraints are hard constraints (requirements)
        Task Placement Strategies are best effort (optimization)
      - Task Placement Strategies
        
        Binpack
        
        Tasks are placed on the least available amount of CPU and Memory
        
        Minimizes the number of EC2 instances in use (cost savings)
        
        Random
        
        Spread
        
        Tasks are placed evenly based on the specified attribute value
        
        Attributes: instanceId, instance-type, availability-zone, ami-id, custom-attribute
        
        Mix them together
        
        spread-az + spread-instanceid
        
        spread-az + binpack-memory,
      - Task Placement Constraints (rules to restrict where tasks can be placed on EC2 instances)
        
        distinctInstance
        
        Places each Task of same Service on a different EC2 instance
        
        memberOf
        
        Tasks are placed on EC2 instances that satisfy a specified expression
        
        Uses the Cluster Query Language (advanced)
        
        Attributes: instance-type, availability-zone, ami-id, os-type
        
        Add custom attribute to instances
  - - - You run the task and the task displays a PENDING status and then disappears
        
        Application or configuration errors
        
        Check ECS service event for ECS stopping and replaces a task because the containers in the task have
        
        Stopped running
        
        Failed too many health checks from
        
        Task is stuck in the PENDING state
        
        Container agent cannot download the Docker image from ECR or Docker Hub
        
        No Route to ECR: If your task is in a private subnet, it needs a NAT Gateway or ECR VPC Endpoints to pull images
        
        Security Groups: Ensure your task's security group allows outbound HTTPS (Port 443) traffic to reach the image repository
        
        IAM Permission Issues (Task Execution Role)
        
        Missing ECR Permissions: The role must have ecr:GetDownloadUrlForLayer and ecr:BatchGetImage
        
        Secrets Manager or SSM Parameter: If Task definition pulls environment variables from Secrets Manager or SSM Parameter Store, the Execution Role must have permissions to read those specific secrets. If it can't, the task stays PENDING
- - - - EKS Auto Mode
        
        Auto Mode handles the entire lifecycle of EC2 instances
        
        Based on integrated AWS managed Karpenter (no ASG)
        
        OS is only Bottlerocket
      - Managed Node Groups
        
        One or more Amazon EC2 instances running the latest EKS-optimized AMIs
        
        All nodes provisioned as part of an ASG
        
        Use launch templates to customize the configuration
      - Self-Managed Nodes
        
        Nodes created by you and registered to the EKS cluster and managed by an ASG
        
        You can use prebuilt AMI - Amazon EKS Optimized AMI
      - Fargate
        
        Each Pod runs in its own isolated compute environment (a micro-VM)
        
        You cannot run DaemonSets
    - - Leverages a Container Storage Interface (CSI) compliant driver
        
        CSI act as the "plug-in" between EKS and AWS storage
        
        Need to specify StorageClass manifest on your EKS cluster
      - Supports for:
        
        EBS
        
        EFS
        
        FSx for Lustre (HPC/AI)
        
        FSx for NetApp ONTAP (Enterprise)
        
        S3 (S3 CSI Driver)
      - How Volumes are Provisioned -K8s uses three main abstractions to manage these volumes:
        
        PersistentVolume (PV): The physical storage resource in your AWS account
        
        PersistentVolumeClaim (PVC): A "request" for storage made by a pod. Specify size and access mode, and K8s finds or creates a PV to match
        
        StorageClass (SC): The "template" for storage. It defines which CSI driver to use and parameters like EBS volume type (gp3, io2)
      - EKS Auto Mode (storage management is greatly simplified)
        
        Built-in CSI Drivers: (EBS or EFS CSI drivers) You don't need to manually install or patch CSI drivers; AWS manages them as part of the cluster lifecycle
        
        Managed EBS: When your pod needs an EBS volume, EKS Auto Mode ensures the node it launches is in the same Availability Zone as the volume, preventing the common "Multi-AZ scheduling" error
      - Storage options for Fargate:
        
        Ephemeral
        
        EFS
        
        S3 (S3 CSI Driver)
    - - Cluster AutoScaler (legacy)
        
        Automatically adjusts the number of nodes (same size) in your cluster using ASG
        
        Karpenter
        
        Launching right-sized compute resources (EC2 instances, Fargate) in response to load changes in < 1 minute
        
        Changes instance-types
        
        Auto Mode (AWS-managed Karpenter)
        
        Automatically scales cluster compute resources with ability to consolidate workloads and delete node
        
        Changes instance-types
  - - - Logs are not enabled by default (except in Auto Mode)
        
        When you enable control plane logging, the logs are sent to CloudWatch Logs
        
        EKS Control Plane Log Types
        
        API Server (api)
        
        Audit (audit)
        
        Authenticator (authenticator)
        
        Controller Manager (controllerManager)
        
        Scheduler (scheduler)
        
        Ability to select the exact log types to send to CloudWatch Log
    - - Node logs provide visibility into the health of your EC2 instances (Kubelet, Container Runtime, OS system logs)
        
        Managed & Self-Managed Nodes (EC2):
        
        Use the CloudWatch Agent or Fluent Bit deployed as a DaemonSet
        
        EKS Auto Mode:
        
        AWS automatically collects system logs from the Bottlerocket OS and sends them to CloudWatch
        
        No need to install or manage an agent
        
        Fargate:
        
        Node logs are not accessible because the nodes are fully managed and abstracted away by AWS
    - - EC2/Auto Mode (Recommended)
        
        AWS for Fluent Bit
        
        The industry standard is to run Fluent Bit as a DaemonSet
      - Fargate
        
        Fargate Built-in Log Router
        
        You cannot run DaemonSets on Fargate
        
        AWS provides a built-in log router based on Fluent Bit
        
        No need of sidecar container in your Pods
      - CloudWatch Observability Add-on
        
        AWS offers a consolidated EKS Add-on that installs the CloudWatch Agent and Fluent Bit
        
        It sets up "Container Insights," which gives you pre-built dashboards in CloudWatch that correlate your logs with CPU/Memory
      - Container logs are stored on a Node directory /var/log/containers