CloudWatch Metrics
Provide metrics for every services
Metric is a variable to monitor
CPUUtilization
NetworkIn
Metrics belong to namespaces
CloudWatch Metric Streams
Continually stream CloudWatch
metrics with near-real-time delivery and low latency
Kinesis Data Firehose
S3
RedShift
OpenSearch
3rd party
Datadog
Splunk
...
CloudWatch Logs
Log Group
Log Stream
Arbitrary name, usually representing an application
instances within application / log files / containers
Can send logs to
S3 (export)
Kinesis Data Streams
Kinesis Data Firehose
Lambda
OpenSearch
Security
Encrypted by default
Can setup KMS-based encryption
CloudWatch Logs - Sources
SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
Elastic Beanstalk
ECS
Lambda
VPC FLow Logs
API Gateway
CloudTrail based on filter
Route53
CloudWatch Logs Insights
Search and analyze log data stored in CloudWatch Logs
Query Engine
CloudWatch Logs Subscriptions
Get a real-time log events from CloudWatch Logs for processing and analysis
Can send to
Kinesis Data Streams
Kinesis Data Firehose
Lambda
Subscription Filter
Cross-Account Subscription
send log events to resources in a different AWS account
KDS
KDF
CloudWatch Logs Agent & Unified Agent
CloudWatch Logs Agent
CloudWatch Unified Agent
Old version of the agent
Can only send to CloudWatch Logs
Collect additional system-level metrics such as RAM, processes, etc…
Collect logs to send to CloudWatch Logs
Centralized configuration using SSM Parameter Store
CloudWatch Unified Agent – Metrics
Collected directly on Linux server / EC2 instance
CPU
Disk Metrics
RAM
Netstat
Processes
Swap Space
CloudWatch Alarms
Alarms are used to trigger notifications for any metric
Various options
Sampling
% max, min
Alarm State
OK
INSUFFICIENT_DATA
ALARM
Period
Length of time in seconds to evaluate the metric
High resolution custom metrics
CloudWatch Alarm Targets
Stop, Terminate, Reboot, or Recover an EC2 Instance
Trigger Auto Scaling Action
Send notification to SNS
CloudWatch Alarms – Composite Alarms
Composite Alarms are monitoring the states of multiple other alarms
AND and OR conditions
Helpful to reduce “alarm noise” by creating complex composite alarms
EventBridge
EventBridge
Schedule
Event Pattern
Event rules to react to a service doing something
Trigger Lambda functions, send SQS/SNS messages…
EventBridge
Can be accessed by other AWS accounts using Resource-based
Policies
can archive events (all/filter) sent to an event bus (indefinitely or set period)
Ability to replay archived events
Amazon EventBridge – Schema Registry
EventBridge can analyze the events in
your bus and infer the schema
The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
Schema can be versioned
Amazon EventBridge – Resource-based Policy
Manage permissions for a specific Event Bus
Example: allow/deny events from another AWS account or AWS region
Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region
CloudWatch Container Insights
Collect, aggregate, summarize metrics and logs from containers
Available on
ECS
EKS
Kubernetes platforms on EC2
Fargate
CloudWatch Lambda Insights
Collects, aggregates, and summarizes system-level metrics including CPU time, memory, disk, and network
Collects, aggregates, and summarizes
diagnostic information such as cold starts and Lambda worker shutdowns
Lambda Insights is provided as a
Lambda Layer
CloudWatch Application Insights
Provides automated dashboards that show potential problems with monitored applications, to help isolate ongoing issues
Powered by SageMaker
Enhanced visibility into your application health to reduce the time it will take you to troubleshoot and repair your applications
CloudTrail
CloudTrail
Provides governance, compliance and audit for AWS Account
Enabled by default!
Can put logs from CloudTrail into CloudWatch Logs or S3
A trail can be applied to All Regions (default) or a single Region
CloudTrail Events
Management Events
Data Events
CloudTrail Insight Events
Operations that are performed on resources in AWS account
By default, trails are configured to log management events
Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
By default, data events are not logged (because high volume operations)
Amazon S3 object-level activity
can separate Read and Write Events
AWS Lambda function execution activity (the Invoke API)
Enable CloudTrail Insights to detect unusual activity
inaccurate resource provisioning
hitting service limits
Bursts of AWS IAM actions
Gaps in periodic maintenance activity
CloudTrail Insights analyzes normal management events to create a baseline
Continuously analyzes write events to detect unusual patterns
Anomalies appear in the CloudTrail console
Event is sent to Amazon S3
An EventBridge event is generated (for automation needs)
CloudTrail Events Retention
Events are stored for 90 days in CloudTrail
To keep events beyond this period, log them to S3 and use Athena
AWS Config
AWS Config
Helps with auditing and recording compliance of AWS resources
Helps record configurations and changes over time
Can receive alerts (SNS notifications) for any changes
AWS Config is a per-region service
Can be aggregated across regions and accounts
Possibility of storing the configuration data into S3 (analyzed by Athena)
Config Rules
Can use AWS managed config rules (over 75)
Can make custom config rules (must be defined in AWS Lambda)
evaluate if each EBS disk is of type gp2
evaluate if each EC2 instance is t2.micro
Rules can be evaluated / triggered
For each config change
And / or: at regular time intervals
AWS Config Rules does not prevent actions from happening
No deny
Config Rules – Remediations
Automate remediation of non-compliant resources using SSM Automation Documents
Use AWS-Managed Automation Documents or create custom Automation Documents
Can set Remediation Retries if the resource is still non-compliant after auto- remediation
Config Rules – Notifications
Use EventBridge to trigger notifications when AWS resources are noncompliant
Ability to send configuration changes and compliance state notifications
to SNS (all events – use SNS Filtering or filter at client-side)