Please enable JavaScript.
Coggle requires JavaScript to display documents.
Database - Coggle Diagram
Database
RDS
-
OLAP
-
This is all about data analytics using large amount of data, as well as complex queries that take a long time to complete
RDS is not suitable for analysing large amounts of data. Use instead a DWH like Redshift that is optimised for OLAP
RDS Advantages
- Up and Running in Minutes
- Multi-AZ
- Failover capability
- Automated backups
-
Multi-AZ
-
-
RDS automatically fail over to the standby instance during a failure so database operations can resume quickly without admin intervention
Multi-AZ is for HA not for improving performance, so you cannot connect to the standby when the primary database is active
Multi-AZ DB instance deployment has 1 standby DB instance that provides failover support, but doesn't serve read traffic
Multi-AZ DB cluster deployment - NEW March 2022 - available for MySQL and PostgreSQL only - has 2 readable standby DB instances that provide failover support and can also serve read traffic
MySQL, Oracle and PostgreSQL use synchronous physical replication to keep standby instance data current with the primary instance
SQL Server uses synchronous logical replication and SQL Server's native Mirroring technology to ensure the standby is up to date
Advantages:
- High Availability
- Backup and Restores taken from secondary avoids I/O suspension on the primary
- You can force a failover from an AZ to another by rebooting the instance (console, API, CLI)
The failover mechanism automatically changes the DNS record of the DB instance to point to the standby DB instance. The old instance becomes a standby while the issue is addressed, and if it's not easily fixable, the standby instance is replaced
Relational Database
- SQL Server
- PostgreSQL
- Oracle
- MariaDB
- MySQL
- Aurora
A Read Replica is a read-only copy of your primary database that is based on asynchronous replication
Great for:
- Read-heavy workload
- Takes the load off your primary database. (e.g. BI, Reporting, ETL, ...)
- Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy workloads
- Scheduled Maintenance / Backup serving read traffic while the source DB is unavailable
- Same AZ
- Cross-AZ
- Cross-Region
-
-
Key Facts
-
Multiple Read Replicas are supported MySQL, MariaDB, PostgresQL allow you to have up to 15 read replica for each DB instance, Oracle and SQL Server up to 5
Primarily used for Scaling Read Performance, not for DR (you may use Read Replicas for DR)
Differences between read replicas for different DB engines
- MySQL, PostgreSQL, MariaDB, SQL Server uses engine's native asynchronous replication
- Aurora DB uses virtualized storage layer based on SSD designed for DB, Replicas share the same storage as the source instance hence no need to copy data on replica nodes
You can have a Read Replica from an existing Read Replica, but be aware of latency
Depending of the engine you might not be able to take DB snapshots or enable automated backups on a Read Replica
-
-
Creating Read Replicas:
- When creating a Read Replicas, AWS takes a snapshot of your database
- With Multi AZ: the snapshot will be from your secondary instance with no performance impact on the primary
- Without Multi AZ: the snapshot is from your primary and can cause a brief I/O suspension about 1 minute
- When the new Read Replica is created you can connect to it using its new endpoint DNS address
Up to 15 Read Replicas for MySQL and PostgreSQL, up to 5 for Oracle and SQLServer
RDS Backup
The automated backup enables point-in-time recovery of your DB instance. When turned on for your DB Instance. RDS automatically performs a full daily snapshot and captures transaction logs every 5 minutes. Backups retention period default is 7 days but can be set to up to 35 days
DB Snapshots are user-initiated and enable you to back up your DB instance in a known state as frequently as you wish, and then restore to that specific state at any time. DB Manual Snapshots can be shared with other AWS Accounts
- You can disable automatic backups immediately by setting the retention period to 0
- If you set the value to 0 and receive a message saying that the retention period must be between 1 and 35, check to make sure you haven’t setup a read replica for the instance
See the Latest restorable time for your DB:
- In the RDS Console > Automated backups (along with Earliest restorable time)
- Use the AWS CLI describe-db-instances command and look for LatestRestorableTime
-
-
You can enable encryption at rest when you create your RDS instance. Data that is encrypted at rest includes: storage for DB instances, automated backups, read replicas and snapshots.
For encrypted and unencrypted DB instances, data that is in transit between the source and the read replicas is encrypted, even when replicating across AWS Regions
To encrypt your DB instance you can choose a customer managed key or the AWS managed key for Amazon RDS. Both rely on AWS KMS
Once you have created an encrypted DB instance, you can't change the KMS key used by that DB instance
-
-
You can't have an encrypted read replica of an unencrypted DB instance or an unencrypted read replica of an encrypted DB instance
-
To share encrypted snapshots across account you must share the customer managed KMS key (cannot share using the default key)
-
To copy an encrypted snapshot from one AWS Region to another, you must specify the KMS key in the destination AWS Region. This is because KMS keys are specific to the AWS Region that they are created in
You can't unencrypt an encrypted DB instance. However, you can export data from an encrypted DB instance and import the data into an unencrypted DB instance
-
Encrypted read replicas must be encrypted with the same KMS key as the source DB instance when both are in the same AWS Region
TDE (Transparent Data Encryption) supported for Oracle and SQL Server, automatically encrypt data before it is written to storage and decrypt data when it is read from storage
Amazon RDS Proxy
Makes applications more resilient to database failures by automatically connecting to a standby DB instance while preserving application connections
Allow to handle unpredictable surges in database traffic (oversubscribing connections or creating new connections at a fast rate)
Allow your applications to pool and share database connections to improve their ability to scale and also help with availability also queue request during workload peak
Supports: MariaDB, MySQL, PostgreSQL and SQL Server
Queues / throttles app connections that can't be served immediately from the pool of connections. Although latencies might increase, your app can continue to scale
-
-
Detects failover and routes requests to standby instance quickly, up to 66% faster failover time
-
Use Cases:
- app with unpredictable workloads
- app which open/close database connections infrequently
- app which require availability through transient failures
- app with open but idle connections
Key Features
RDS events provide operational events such as DB instance events, DB parameter group events, DB security group events and DB snapshot events. No good for capturing CRUD operations. You can subscribe and trigger Lambda.
Parameter Group a database parameters specify how the database is configured (e.g. memory allocation). Manage your db configuration by associating DB instances and Multi-AZ DB clusters with parameter groups. RDS defines parameter groups with default settings. You can create your own parameter groups
Viewing pending maintenance - use the RDS console, the AWS CLI, or the RDS API. If an update is available, it is indicated in the Maintenance column for the DB instance on the Amazon RDS console
- none
- required – The maintenance action will be applied to the resource and can't be deferred indefinitely
- available – The maintenance action is available, but it will not be applied to the resource automatically. You can apply it manually
- next window – The maintenance action will be applied to the resource during the next maintenance window
- In progress – The maintenance action is in the process of being applied to the resource
Enhanced Monitoring
Monitor how the different processes or threads on a DB instance use the CPU, including the percentage of the CPU bandwidth and total memory consumed by each process
View the metrics for your DB instance using the console, or consume the Enhanced Monitoring JSON output from CloudWatch Logs in a monitoring system of your choice. By default, Enhanced Monitoring metrics are stored in the CloudWatch Logs for 30 days
Gathers its metrics from an agent on the instance. As a result, you might find differences between the measurements, because the hypervisor layer performs a small amount of work
Snapshots
-
When you share an encrypted snapshot across accounts you must share the custom (non-default) KMS key. As the default KMS key (aws/rds) is managed by RDS and cannot be shared
-
-
You cannot share automated snapshot. To share an automated DB cluster snapshot, create a manual DB cluster snapshot by copying the automated snapshot, and then share that copy
IAM DB Authentication
Works with MariaDB, MySQL, and PostgreSQL
An authentication token is a unique string of characters that Amazon RDS generates on request. No need to store user credentials in the database (authentication is managed in IAM). You can also still use standard database authentication, leverage IAM Role and EC2 instance profile for easy integration
Authenticate to your DB instance using IAM identity (user or role). NO need a password to connect the DB instance. Instead, you get an authentication token from RDS service and use it to authenticate to your DB instance.
Use when your applications create fewer than 200 connections per second, and you don't want to manage usernames and passwords directly in your application code. Only works over SSL
Storage Autoscaling
- You can enable storage autoscaling for an Amazon RDS DB instance (Console, API, or the AWS CLI)
- The additional storage is in increments of whichever of the following is greater:
- 10GiB
- 10% of currently allocated storage
- Predicted storage growth exceeding the current allocated storage size in the next 7 hours
- The maximum storage threshold is the limit that you set for autoscaling the DB instance (recommended 26%)
Limitations:
- Autoscaling doesn't occur if the maximum storage threshold would be equaled or exceeded by the storage increment
- Autoscaling can't completely prevent storage-full situations for large data loads
- Autoscaling operations aren't logged by AWS CloudTrail
Performance Insights
- Expands on existing Amazon RDS monitoring features to illustrate and help you analyze your database performance
- Visualize the database load on your Amazon RDS DB instance. You can also filter the load by waits, SQL statements, hosts, or users
- If you turn Performance Insights on or off, however, you don't need to reboot
- Performance Schema is an optional performance tool used by Amazon RDS for MariaDB or MySQL. If you turn Performance Schema on or off, you need to reboot
- Retention – The amount of time to retain Performance Insights data. The retention setting in the free tier is Default (7 days). To retain your performance data for longer, specify 1–24 months
- AWS KMS key – Specify your AWS KMS key. Performance Insights encrypts all potentially sensitive data using your KMS key
Basics
It is a MySQL-and PostgreSQL-compatible relational database engine that combines the speed and the availability of high-end commercial database with the simplicity and cost effectiveness of open-source database
Performance:
- 5 x better than MySQL
- 3 x better than PostgreSQL
-
- 2 copies of your data are contained in each AZ
- with a minimum of 3 AZ
- at least 6 copies of your data
Storage Auto Scaling start with 10 GB storage, scales in 10 GB increments up to 128 TiB
Scaling
Transparently handle the loss of up to:
- 2 copies of data without affecting DB write availability
- 3 copies of data without affecting DB read availability
Aurora storage is self-healing with data blocks and disk continually scanned for errors and repaired automatically
Cache Warming pre-populates its buffer pool with pages for common queries that are stored in an in-memory page cache, while provisioning other instances
-
Aurora Auto Scaling
Add or Remove Aurora Replicas automatically, based on metrics you define
-
Auto Scaling Policy
Auto Scaling Decisions based on metrics from CloudWatch, you define the target values
Target Values Replicas are added or removed as necessary, to keep the metric as close as possible to the target value
Target Metric:
- Average CPU utilization % of Aurora Replicas
- Average connections to Aurora Replicas
-
Default 300s cooldown period between scaling actions, you can adjust it for both scale-in and scale-out
-
Storage Auto Scaling start with 10 GB storage, scales in 10 GB increments up to 128 TiB
Types of Replicas
MySQL Replicas with up to 5 read replicas with Aurora MySQL (DB engine/transaction based replica) seconds asynchronous replication
PostgreSQL Replicas with up to 5 read replicas with Aurora PostgreSQL (DB engine/transaction based replica) seconds asynchronous replication
Aurora Replicas with up to 15 read replicas. Automated Failover is only available with Aurora Replicas (storage based replica) millisecond asynchronous replication
Backups
-
-
Automated backups are always enabled on Amazon Aurora DB instances. Backups do not impact database performance
-
Aurora Serverless
-
Automatically starts up, shuts down and scales capacity up or down based on your application's need
Simple and cost effective for infrequent, intermittent and unpredictable workload
Based on the minimum and maximum Aurora Capacity Units (ACU) that you define. An ACU is a combination of processing and memory capacity
-
Aurora Serverless v2 supports all features of provisioned Aurora, including read replica, multi-AZ configuration, Global Database, RDS proxy, and Performance Insights
-
You can restore a snapshot taken from an existing Aurora Provisioned cluster into an Aurora Serverless DB Cluster (and vice versa)
:check: Provisioned restore to Serverless
:red_cross: Serverless restore to Provisioned
Data API - access Serverless DB with an API endpoint (no JDBC connection)
- Secure HTTPS endpoint to run SQL
- No persistent DB connection management
User must grant permission to Data API and Secret Manager
Aurora Global Databases
more
- 1 primary DB cluster in one Region (Writer/Reader)
- up to 5 secondary DB clusters in different Regions (Read-only)
-
Span multiple AWS Regions, enabling low latency global reads and providing fast recovery from the rare outage that might affect an entire AWS Region
-
Manage the RPO in Aurora for PostgreSQL (at least one of the secondary clusters stays within the target RPO window)
Write Forwarding - allows Secondary DB Clusters to forward write operations to the Primary DB Cluster. Reduces the number of endpoints to manage
- Amazon RDS Proxy for Aurora
- RDS Proxy reduces Aurora MySQL failover time by up to 80%
- RDS Proxy reduces RDS MySQL failover time by up to 30%
Aurora Cloning
Create a new cluster that initially shares the same data pages as the original, but is a separate and independent volume
Creating a clone is faster and more space-efficient than physically copying the data using other techniques, such as restoring a snapshot
-
Aurora DB Cluster
- Primary Instance in an AZ. Read/Write, Reads from local data copies while writes happens to all data copies
- Replica Instances in other AZs, multiple Replicas for AZ are allowed. Reads from local data copies
- 2 Data Copies in each AZ named Cluster Volumes
100% CPU Utilization
- if it is writes -> scale up Instance Type
- if it reads -> add read Replicas
Resetting the DB instance owner password
- If you get locked out of your DB cluster
- Log in as the master user
- Reset the credentials for other administrative users or roles
- If you can't log in as the master user
- the AWS account owner can reset the master user password
Freeable memory issues
- Aurora (Provisioned)
- Freeable memory is the total RAM on a DB instance that can be made available to the database engine (free OS memory + available buffer/page memory)
- If consistently low on freeable memory or uses swap space, consider scaling up to a larger DB instance class
- Aurora Serverless
- In Aurora Serverless v2, FreeableMemory is the unused memory that's available when the Aurora Serverless v2 DB instance is scaled to its maximum capacity
- If this metric approaches a value of 0, the DB instance has scaled up as much as it can. It's nearing the limit of its available memory. Consider increasing the maximum ACU setting for the cluster
Backtracking
"Rewinds" the DB cluster to the time you specify. Backtracking is not a replacement for DB cluster backup
Avantages:
- Doesn't require a new DB cluster
- Rewinds the DB cluster in minutes
- Backtrack backward and than forward
-
-
-
DynamoDB
-
Capacity Mode
On-Demand Capacity
- Pay more per request than with provisioned capacity
- Use for new product launches
- Unpredictable application traffic
- Pay-per-request reads/writes pricing
- Balance cost and performance
- No minimum capacity
Provisioned Capacity
-
Enable Auto scaling to automatically adjust your table’s capacity based on the specified utilization rate
- Write Capacity Units (WCU)
- WCU = 1 => 1 standard write/s for item up to 1KB
- WCU = 2 => transactional write/s for item up to 1KB
- Read Capacity Units (RCU)
- RCU = 1 => 2 eventually consistent read/s for item up to 4KB
- RCU = 1 => 1 strongly consistent read/s for item up to 4KB
- RCU = 2 => 1 transactional read/s for item up to 4KB
Read Consistency
Eventually consistent reads (defaults) consistency across all copies of data is usually reached within a second. Best read performance
Strongly consistent reads return a result that reflect all writes that received a successfully response prior to the read
Secuirty
- Encryption at rest using KMS
- Site-to-site VPN
- Direct Connect (DX)
- IAM Policies and Roles
- Fine-grained access (dynamodb:LeadingKeys)
- CloudWatch and LogTrail
- VPC endpoints (allows to directly connect do DynamoDB without leaving AWS backbone)
- Amazon DynamoDB is a fast and flexible NoSQL database service for all application that need consistent, single-digit millisecond latency at any scale
- A fully managed database and supports both document and key-value pair data models
- Flexible data model and reliable performance make it a great fit for:
- mobile
- web
- gaming
- ad-tech
- IoT
Key Facts
-
-
Keys:
- Primary keys to uniquely identify each item in a table
- simple (a partition key only)
- composite (a partition key combined with a sort key)
- Secondary indexes are essential to support the query patterns that application requires
- Local secondary indexes has the same partition key as the base table, but a different sort key (partition scope limited to primary key)
- Global secondary indexes partition key and a sort key that can be different from those on the base table (partition scope spans all data in the tablr)
-
Transactions
DynamoDB Transactions provide ACID across 1 or more tables within a single AWS account and region use transaction when building app that requires coordinated inserts, deletes or updates to multiple items as part of a single logical business operation
Up to 25 items or 4MB of data
ACID = All or Nothing
- All = transaction succeeds across 1 or more tables
- Nothing = transaction fails
ACID:
- Atomicity all changes to the data must be performed successfully or not at all
- Consistency data must be in a consistent state before and after the transaction
- Isolation no other process can change the data while the transaction is running
- Durability the changes made by a transaction must persist
- Processing financial transactions
- Fulfilling and managing orders
- Building multiplayer game engines
- Coordinating actions across distributed components and services
Reads
- Eventual consistency
- Strong consistency
- Transactional
-
DynamoDB Backup
-
-
Using DynamoDB with AWS Backup: you can copy your on-demand backups across AWS accounts and Regions, add cost allocation tags to on-demand backups, and transition on-demand backups to cold storage for lower costs
DynamoDB Streams
- Time-ordered sequence of item-level changes in a table
- Stored for 24 hours (every shard)
- Insert, Update and Delete
- View type - which versions of the changed items to push to the stream:
- Key attributes only
- New image
- Old image
- New and old images
- Combine with Lambda to get stored procedure like feature
- DynamoDB Streams Kinesis Adapter allows you to create applications via the Kinesis Client Library, with the API calls seamlessly directed at the DynamoDB Streams endpoint
Global Tables
Managed Multi-Master, Multi-Region Replication
- Globally distributed app
- Based on DynamoDB streams
- Multi-Region redundancy for DR and HA
- No app rewrites
- Replication latency under 1 second
To create a replica, you must set your table's and index's:
- throughput capacity to auto scaling (provisioned capacity mode)
- capacity mode to on-demand
-
For replication to work, DynamoDB Streams will be activated automatically for new and old images
Performance
Use Partition keys with high-cardinality attributes, which have a large number of distinct values for each item
Design application for uniform activity across all logical partition keys in the table and its secondary indexes
Sort keys:
- gather related information together in one place where it can be queried efficiently
- composite sort keys let define hierarchical (one-to-many) relationships in your data that you can query at any level of the hierarchy
DocumentDB
(MongoDB)
Amazon DocumentDB allows you to run MongoDB on the AWS cloud. It is a managed database service that scales with your workload and safely and durably stores your database information
Get rid of operational overhead. With DocumentDB you no longer have to worry about all the manual task when running MongoDB such as cluster management software, configuring backups or monitoring
-
-
Key Features
- Highly available with replication across 3 AZ
- Storage automatically grows in increments of 10GB
Cluster
- A cluster can have up to 16 instances
- Primary instance—Supports read and write operations (1 x cluster)
- Replica instance—Supports only read operations
- HA using replicas as failover targets for the primary instance
-
-
-
-
MemoryDB for Redis
- Redis-compatible, durable, in-memory database service
- Ultra-fast performance with over 160 millions requests/second
- Durable in-memory data storage with Multi-AZ transactional log
- Scale seamlessly from 10s GBs to 100s TBs of storage
- Use cases: web and mobile apps, online gaming, media streaming, ...