Monitoring

CloudWatch Metrics

Provide metrics for every services

Metric is a variable to monitor

CPUUtilization

NetworkIn

Metrics belong to namespaces

CloudWatch Metric Streams

Continually stream CloudWatch
metrics with near-real-time delivery and low latency

Kinesis Data Firehose

S3

RedShift

OpenSearch

3rd party

Datadog

Splunk

...

CloudWatch Logs

Log Group

Log Stream

Arbitrary name, usually representing an application

instances within application / log files / containers

Can send logs to

S3 (export)

Kinesis Data Streams

Kinesis Data Firehose

Lambda

OpenSearch

Security

Encrypted by default

Can setup KMS-based encryption

CloudWatch Logs - Sources

SDK, CloudWatch Logs Agent, CloudWatch Unified Agent

Elastic Beanstalk

ECS

Lambda

VPC FLow Logs

API Gateway

CloudTrail based on filter

Route53

CloudWatch Logs Insights

Search and analyze log data stored in CloudWatch Logs

Query Engine

CloudWatch Logs Subscriptions

Get a real-time log events from CloudWatch Logs for processing and analysis

Can send to

Kinesis Data Streams

Kinesis Data Firehose

Lambda

Subscription Filter

Cross-Account Subscription

send log events to resources in a different AWS account

KDS

KDF

CloudWatch Logs Agent & Unified Agent

CloudWatch Logs Agent

CloudWatch Unified Agent

Old version of the agent

Can only send to CloudWatch Logs

Collect additional system-level metrics such as RAM, processes, etc…

Collect logs to send to CloudWatch Logs

Centralized configuration using SSM Parameter Store

CloudWatch Unified Agent – Metrics

Collected directly on Linux server / EC2 instance

CPU

Disk Metrics

RAM

Netstat

Processes

Swap Space

CloudWatch Alarms

Alarms are used to trigger notifications for any metric

Various options

Sampling

% max, min

Alarm State

OK

INSUFFICIENT_DATA

ALARM

Period

Length of time in seconds to evaluate the metric

High resolution custom metrics

CloudWatch Alarm Targets

Stop, Terminate, Reboot, or Recover an EC2 Instance

Trigger Auto Scaling Action

Send notification to SNS

CloudWatch Alarms – Composite Alarms

Composite Alarms are monitoring the states of multiple other alarms

AND and OR conditions

Helpful to reduce “alarm noise” by creating complex composite alarms

EventBridge

EventBridge

Schedule

Event Pattern

Event rules to react to a service doing something

Trigger Lambda functions, send SQS/SNS messages…

EventBridge

Can be accessed by other AWS accounts using Resource-based
Policies

can archive events (all/filter) sent to an event bus (indefinitely or set period)

Ability to replay archived events

Amazon EventBridge – Schema Registry

EventBridge can analyze the events in
your bus and infer the schema

The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus

Schema can be versioned

Amazon EventBridge – Resource-based Policy

Manage permissions for a specific Event Bus

Example: allow/deny events from another AWS account or AWS region

Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region

CloudWatch Container Insights

Collect, aggregate, summarize metrics and logs from containers

Available on

ECS

EKS

Kubernetes platforms on EC2

Fargate

CloudWatch Lambda Insights

Collects, aggregates, and summarizes system-level metrics including CPU time, memory, disk, and network

Collects, aggregates, and summarizes
diagnostic information such as cold starts and Lambda worker shutdowns

Lambda Insights is provided as a
Lambda Layer

CloudWatch Application Insights

Provides automated dashboards that show potential problems with monitored applications, to help isolate ongoing issues

Powered by SageMaker

Enhanced visibility into your application health to reduce the time it will take you to troubleshoot and repair your applications

CloudTrail

CloudTrail

Provides governance, compliance and audit for AWS Account

Enabled by default!

Can put logs from CloudTrail into CloudWatch Logs or S3

A trail can be applied to All Regions (default) or a single Region

CloudTrail Events

Management Events

Data Events

CloudTrail Insight Events

Operations that are performed on resources in AWS account

By default, trails are configured to log management events

Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)

By default, data events are not logged (because high volume operations)

Amazon S3 object-level activity

can separate Read and Write Events

AWS Lambda function execution activity (the Invoke API)

Enable CloudTrail Insights to detect unusual activity

inaccurate resource provisioning

hitting service limits

Bursts of AWS IAM actions

Gaps in periodic maintenance activity

CloudTrail Insights analyzes normal management events to create a baseline

Continuously analyzes write events to detect unusual patterns

Anomalies appear in the CloudTrail console

Event is sent to Amazon S3

An EventBridge event is generated (for automation needs)

CloudTrail Events Retention

Events are stored for 90 days in CloudTrail

To keep events beyond this period, log them to S3 and use Athena

AWS Config

AWS Config

Helps with auditing and recording compliance of AWS resources

Helps record configurations and changes over time

Can receive alerts (SNS notifications) for any changes

AWS Config is a per-region service

Can be aggregated across regions and accounts

Possibility of storing the configuration data into S3 (analyzed by Athena)

Config Rules

Can use AWS managed config rules (over 75)

Can make custom config rules (must be defined in AWS Lambda)

evaluate if each EBS disk is of type gp2

evaluate if each EC2 instance is t2.micro

Rules can be evaluated / triggered

For each config change

And / or: at regular time intervals

AWS Config Rules does not prevent actions from happening

No deny

Config Rules – Remediations

Automate remediation of non-compliant resources using SSM Automation Documents

Use AWS-Managed Automation Documents or create custom Automation Documents

Can set Remediation Retries if the resource is still non-compliant after auto- remediation

Config Rules – Notifications

Use EventBridge to trigger notifications when AWS resources are noncompliant

Ability to send configuration changes and compliance state notifications
to SNS (all events – use SNS Filtering or filter at client-side)