Please enable JavaScript.

Coggle requires JavaScript to display documents.

DB Specialty - Coggle Diagram

- - - - On-Demand autoscaling, good for intermittent or unpredictable workloads, configurable capacity rated in ACUs (Aurora Capacity Units). 1ACU = a combination of approximately 2 GB of memory, corresponding CPU, and networking
      - When the users connect, they are actually connecting to the Proxy Fleet, which then routes the requests to one of the Aurora instances.
      - As the workload changes, DB instances are added and returned to the designated DB Pool automatically
    - - Have 1 primary region for writes and up to 5 secondary read-only regions
      - Data is replicated automatically from the primary to the secondary region using Aurora's dedicated infrastructure
      - Low latency reads in other regions
  - - - Read Replicas can be used as a near-zero downtime migration option. Not really used for ongoing replication.
      - An option with some downtime is to stop the write to the primary instance RDS MySQL instance, take a snapshot and restore the snapshot to Aurora instance
      - Replication from on-prem MySQL DB to Aurora MySQL
        
        Enable binary logging (binlog)
        
        Export data from source instance
        
        Import the dump to Aurora cluster
        
        Configure replication between self-hosted source and Aurora target
        
        When replication lag reaches 0, stop write activity to self-hosted db
        
        Update endpoint in application for the cutover
        
        Replication Lag
        
        In highly concurrent or lock-heavy environments, you may find replication lag to be a concern.
        
        Occurs in RDS to Aurora replication and cross-region Aurora replication
      - Can be used as failover targets in Aurora. Instances can be allocated across multiple AZs.
      - Configurable failover priority between tier 0 and tier 15, which assigns specific priority to replicas in the event of a failover. The highest priority replicas (e.g. tier 0) will get promoted to primary.
      - While Aurora MySQL uses binlog replication, Aurora PostgreSQL uses replication slots.
      - Replication in cluster is done at storage level and it's only subject to AuroraReplicaLag, which is generally less than 10ms
  - - - VolumeBytesUsed
      - VolumeWriteIOPS
      - VolumeReadIOPS
    - - Various Utilization Metrics (CPU, freeable memory, etc.)
      - AbortedClients
      - AuroraReplicaLag
      - Blocked Transactions
      - Commit latency
      - Free local storage
      - DML and DDL latency provided in Aurora MySQL
      - MaximumUsedTransactionIDs - important in PostgreSQL
  - - - Backup retention period is 1 to 35 days
      - Automatically taken backups
      - Unlike in RDS, the backups are taken continuously
      - These automated backups cannot be disabled
      - Backups are created from the Storage Cluster
    - - Mostly works like RDS
      - Latest Restorable Time (LRT)
      - New cluster is created (instead of an existing cluster being rolled back)
      - Can be time intensive if we are restoring to a point with a ton of data
    - - Feature that allows us to roll the cluster back to a previous state (rewinds the cluster storage in place, instead of a new cluster being created)
      - Faster than PITR
      - Can move both backward and forward. E.g. backtrack to 2hrs ago and then go forward by an hour
      - It interrupts the cluster operation
    - - Creates a new cluster. Useful for testing purposes
      - Faster creation than PITR
      - Connects to source storage cluster and uses copy on write protocol where both clusters share the cluster volume until the data starts to change
    - - Native MySQL tools
        
        Inefficient for data larger than 10 GB
        
        Requires source DB interruption
        
        We can use MySQL native mysqldump and mysqlimport to migrate data from any MySQL instance to Aurora
      - Percona XtraBackup
        
        Uses innobackupx to create backup
        
        Backup can be loaded directly from S3
        
        Intended for new cluster target
      - Load data from S3
        
        Load flat files like CSV fromatted data from S3
        
        Must already have a running cluster to be able to run this command
        
        Data can be unloaded from Aurora to S3 in a similar fashion
    - - Native PostgreSQL
        
        pg_dump and pg_restore
        
        Handles large datasets
        
        Requires source DB interruption
      - Also has the ability to import csv-formatted flat files from S3
  - - - Create a new Arora cluster with the VPC that we want to use
        
        When the Replication Lag = 0, the replica has caught up with the source (in our old Cluster 1 hosted in the VPC we don't want to use anymore)
        
        At that point, we drain the connections on the old cluster and update the DNS record that our application points to to point to the new Aurora cluster. That's it - you migrated the Aurora cluster from one VPC to another with little to no downtime.
    - - Cluster Endpoint - always resolves to the writer node
      - Reader Endpoint - always resolves to one of the read replicas
      - Custom Endpoints - allows us to define load balancing between the instances ourselves (e.g. when we use different instance types and we want to customize load distribution). You can create up to 5 custom endpoints.
    - - Primary writer region and up to 5 secondary read-only regions
      - Dedicated fully managed infrastructure is used for replication
      - Low replication latency (typically less than a second)
      - RPO of 1s and RTO of 1min, a huge perk for DR planning
      - Data transfer fees for replication to read-only secondary region do not apply for global tables
      - Write forwarding - when an application connects to the secondary region and submits write actions (INSERT, DELETE, UPDATE), the secondary region actually forwards these queries in the background to the primary region. These changes, when the primary region applies them, will then get replicated to the secondary region.
      - Limitations
        
        Can't use cloning
        
        Can't do Backtrack
        
        No support for parallel query (specific to MySQL)
        
        Can't use Aurora Serverless
        
        Can't Stop/Start the Aurora DB
  - - - Test schema changes by cloning a cluster
      - Parameter Group Experimentation
      - Disruptive Workloads - if you need to run heavy queries, clone your prod cluster and run queries on the clone
      - Cross Account
      - Outside Data Access - Provide access to contractors using AWS Resource Access Management (RAM) to be able to clone the prod cluster instead of access it directly.
      - Faster than snapshots - Use cloning to more quickly move data between AWS Accounts
- - - - Managed point and click service available through the AWS Console
      - Best migration speed and ease
      - Can be used with binary log replication for near zero migration downtime
    - - Managed backup ingestion from Percona XttraBackup files stored in an S3 bucket
      - High Performance and can be used with binary log replication for near zero migration downtime
    - - Schemas can be migrated as-is without conversion
      - Data migration can be performed manually using existing, well documented command line utilities
      - Can be used with binary log replication for near-zero migration downtime
    - - Generally, you don't want to use DMS for homogeneous migrations as your first option since it's a bit more complex than the simple tools like snapshot migration and others listed above
      - Managed point and click data migration service
      - Schemas must be migrated separately (using SCT). The two steps in migration are Schema Migration and Data Migration.
      - Supports CDC replication for near-zero migration downtime
      - The best tool for heterogeneous migrations to Aurora (e.g. DB2 on prem to Aurora MySQL)
  - - - Create a replication server
        
        Create source and target endpoint that have your data stores
        
        Create one or more migration tasks to migrate data between the source and target data store
    - - Full load only
      - Full load followed by CDC
      - CDC replication only (e.g. if you've already restored a snapshot yourself)
    - - Do nothing mode - assumes that the tables already exist in the target and that the task should just dump the data there
      - Drop tables on target mode - drops and recreates the tables in the target
      - Truncate mode - data deleted in the target table before migration begins
    - - Full LOB mode - DMS migrates all LOBs from source to target regardless of size
      - Limited LOB mode - you set a maximum size LOB that AWS DMS should accept
    - - On-prem: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP Sybase ASE, IBM DB2 for LUW
      - Azure: Azure SQL Database
      - RDS and S3: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, Aurora with MySQL or PostgreSQL capability, S3
      - DocumentDB
    - - On-prem: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, SAP Sybase ASE and several others
      - Redshift, DynamoDB, Kinesis Data Streams, Kafka, ElasticSearch, DocumentDB and Neptune
    - - DMS has support for data validation to ensure your data was migrated accurately.
      - When enabled, validation begins immediately after a full load is done.
      - For CDC-enabled tasks, Data Validation feature compares the incremental changes.
      - During data validation, DMS compares each row in the source against the target, verifies same data is present and reports any mismatches.
      - There are a ton of data validation settings that we can configure (e.g. SkipLobColumns, FailureMaxCount, etc.)
      - Various metrics can be monitored in CloudWatch during Data Validation (e.g. The number of validated rows per minute, The number of failed validations per minute, etc.)
      - DMS creates a table on the target endpoint (awsdms_validation_failures_v1), where it writes diagnostic information about failures. Helpful table in troubleshooting.
    - - Set up VPC peering
      - Homogeneous migration - Create an RDS snapshot and restore it in another region. Then, enable CDC-only tasks in DMS to replicate ongoing changes.
      - Heterogeneous migration - Use SCT to convert schema and then create DMS task to perform both the Full Load and ongoing CDC replication
    - - Slow DMS Tasks
        
        The task instance may be too small, especially in scenarios when we have multiple tasks running on the task instance
        
        The load from the reads on the source db
        
        The load from the writes on the target db. If you have Data Validation enabled, your target will also be hit with read requests
        
        If we need to, we can break our large migration tasks into multiple tasks on multiple instances.
        
        Recommendations
        
        For an RDS DB instance, disable Multi-AZ for the target instance and transaction logging.
        
        Turn off automatic backups
        
        Use Provisioned IOPS if available
        
        Make sure the task is optimized for LOB migration (use Limited LOB mode, which is the default, and ensure you correctly configure it)
      - DMS Network Issues
        
        Make sure your use SG that permit traffic on ports to and from replication instance
        
        If your target is in VPC, but the source is somewhere else (on-prem, other vpc) then the route table for you replication instance needs to have a route to the source
      - Initial Load of a Schema Fails
        
        Make sure the user account used by the DMS to connect to the source endpoint has the right permissions
      - FKs and secondary indexes missing
        
        Note that DMS does not create secondary indexes, non PK constraints or data defaults. Use DB-native tools or SCT for that.
      - CDC stuck after Full Load
        
        DMS settings can conflict with each other, which could cause slow or stuck replication.
        
        If you haven't created PK on the target tables, full scans will occur, which are very resource heavy
  - - - Download SCT installer
        
        Extract the installer for your OS
        
        Run the installer
        
        Install the jdbc drivers for the source and target DB engines
    - - Create mapping rules in the SCT
        
        Convert the schema using the SCT
        
        Create migration assessment reports
        
        Handle manual conversions in the SCT
        
        Update and refresh the converted schema
        
        Save and apply the converted schema
    - - Useful in scenarios where the source and target are very different and require additional data transformations.
      - It's an external program that's integrated with SCT, but performs data transformation elsewhere (such as an EC2 instance or on Snowball Edge)
      - For very large DB migrations, you can use a SCT replication agent to copy data from your on-premises DB to S3 or Snowball Edge Device. The replication agent works in conjunction with DMS.
      - You can use an SCT data extraction agent to extract data from Apache Cassandra and migrate it to DynamoDB. The agent runs on an EC2 instance, where it extracts data from Cassandra, writes it to the local file system, and uploads it to an S3 bucket. You can then use AWS SCT to copy the data to DynamoDB.
  - - - When network bandwidth is the limiting factor or there is no Internet access
      - Massive amount of data to be migrated (e.g. 100 TB)
    - - Use SCT to extract data locally and load it to Edge device
        
        Ship the Edge device or devices back to AWS
        
        At AWS, Edge device automatically loads data to an S3 bucket
        
        DMS takes the flat files from S3 and migrates data to the target
- - - - Underutilized EBS volumes
      - RDS idle DB instances
      - Underutilized Redshift clusters
      - ElastiCache Reserved Node optimization
      - Redshift Reserved Node optimization
      - RDS Reserved Node Optimization
    - - Helps increase the availability and redundancy of an application, by taking advantage of auto scaling, health checks, multi AZ and backup capabilities
      - Reports on the existence of backups and checks on the automated backups of RDS DB instances. Backups are enabled with the retention period of 1 day by default.
  - - - server_audit_logging - enable/disable Advanced Auditing
      - server_audit_events - specify what events to log
      - server_audit_excl_users and server_audit_incl_users - specify who gets audited
- - - - Supports Gremlin and SPARQL (uses RDF data model)
      - Uses cluster volume, similar to Aurora
      - Supports Multi-AZ and up to 15 replica
      - Uses self-healing, fault-tolerant cluster volume
      - Most efficient method of loading data is by using Neptune's Loader to load data from S3, which can be invoked via HTTP call & supports loading data for both Gremlin & SPARQL (rdf format)
- - - - AWS manages failure detection, backups, recovery
      - Engines
        
        Oracle
        
        PostgreSQL
        
        SQL Server
        
        MariaDB
        
        MySQL
        
        Storage Engines
        
        InnoDB - fully supported in RDS. Point-In-Time restore and snapshot restore features require a recoverable storage engine and are supported for the InnoDB storage engine only
        
        MyISAM - does not support reliable recovery and can result in lost or corrupt data when MySQL is restarted after a recovery
        
        Federated Storage Engine is currently not supported by Amazon RDS for MySQL.
        
        Aurora
        
        Automated backups
        
        Automatically scales in 10GB increments
        
        Uses cluster storage, rather than the EBS volumes. This is one of the key differentiators between Aurora and RDS. Cluster volume is shared between the Primary DB and the Read Replicas, so we are not limited by logical replication.
        
        Supports Multi-Master
        
        Supports up to 15 read replicas
      - Multi-AZ - standby instance in another AZ
        
        We can convert existing single-AZ instance to multi-AZ instance. When we kick off this option, RDS starts to take the snapshot of the primary instance and, when done, copies the snapshot to another AZ, where it starts to spin up a secondary active instance. However, note that there could be some performance hits during the process of taking snapshot of the primary instance/
        
        Unlike in read replicas where the changes are made asynchronously, with a delay, in Multi-AZ mode, since both instances are considered active, the changes are made synchronously. The data must be replicated to the secondary active instance before the primary instance considers the transaction complete.
        
        Note that Multi-AZ gives us availability, while Read replicas give us scalability.
        
        When a disruption of primary instance happens, RDS automatically promotes the secondary instance to be the primary and sets the endpoint to point to the new primary instance.
        
        Recovery for single-AZ failure can take 4-5mins. Recovery from Multi-AZ failure is typically achieved in 60s or less.
        
        The customer is responsible for managing performance bottlenecks with queries. RDS automates recovery from Storage Failure, Network Interruptions, Loss of Availability (e.g. hardware failure)
        
        You can choose to convert an instance to Multi-AZ either immediately or schedule it to happen during the next maintenance period.
        
        When you enable Multi-AZ, you get charged for 2 instances since they are both active.
      - Read-replicas - can be promoted to a primary instance as part of the disaster recovery
        
        SQL Server and Oracle don't support manual snapshots
        
        You can have up to 5 read replicas for each database instance
        
        SQL Server requires you to enable multi-AZ with always on availability groups on primary instance. Also, it only supports read replicas in its Enterprise edition.
        
        Oracle also requires you to purchase enterprise edition in order to enable read replicas
        
        Global read replicas are supported (up to 5 per instance), where you can create replicas in different regions. Only SQL Server doesn't support global read replicas.
        
        You can't use a read replica as a source for you cross-region replica
        
        The read replicas communicate with the primary instance via an encrypted channel fully managed by RDS.
        
        How replication is done varies engine-to-engine. MySQL and MariaDB both use Logical replication, while the other engines use physical replication.
        
        Replica lag can be risk in some applications since there is a delay between the primary instance writing data to its db and the time that the same data gets replicated to the secondary read replicas.
        
        Asynchronous replication used to replicate changes from the master to the replicas
        
        For MariaDB, MySQL and Oracle RDS, when a DB is deleted, all the read replicas are promoted. For PostgreSQL, when a DB is deleted, only the read replicas in the same region are promoted, while the cross-region read replicas get a replication status of "terminated".
      - Backups
        
        Automated
        
        Enabled by default. You can disable them, but it's not advisable. You disabled automated backups by setting retention period to 0
        
        Occur in 30min backup windows
        
        Configurable backup retention period 1-35 days (default is 7 days)
        
        Contain system-wide (the entire volume, not just the DB) snapshots, as well as, transactional. Snapshots stored in S3.
        
        Automated backups don't capture any information about the parameter or option groups. If you restore an automated snapshot, you have to specify these yourself.
        
        When you delete an instance, by default, automated snapshots also get deleted. You can change this to retain them.
        
        If you want to share a snapshot, you must first copy it to a new Snapshot and then share it
        
        Up to 40 automated backups can be retained per region
        
        Point In Time Recovery (PITR) - restore to any specific time within the configured backup retention period
        
        LRT (Latest Restorable Time) - generally within 5mins of current time since transaction logs are sent to S3 every 5mins.
        
        When sharing snapshots, don't delete the source snapshot before the transfer completes.
        
        Snapshots can be copied to other regions, but this process may be slow.
        
        If a snapshot is encrypted and shared with your account, you cannot directly launch an instance from this snapshot. Instead, you have to copy it and that copy also needs to be encrypted.
        
        Oracle and SQL Server snapshots that use Transparent Data Encryption (TDE) can't be shared.
        
        Manual
        
        Stored in S3 indefinitely. There until we delete them.
        
        First snapshot will take the snapshot of the entire DB. The later ones will be incremental. Therefore, the first snapshot may have a strong hit on I/O latency on single AZ instances
        
        The more data you intend to restore, the longer the restoration process will take.
        
        Allows you to restore to a different storage type
        
        When restoring from manual snapshot, the default parameter and option groups will be automatically set to the same as the ones used by the instance from which the manual snapshot was taken. During restoration, you can override these with different values.
        
        You can share snapshots as-is. You can't share it with other accounts if the snapshot was encrypted with the default key. If you used a CMK KMS, you can share the snapshot.
      - Monitoring
        
        Events and Event notifications
        
        SNS notification includes SourceType (such as db-instance) and SourceIdentifier (such as MyExampleDB)
        
        SNS topic get request on failovers, config changes (like parameter groups or option groups), etc. Use SNS notification to trigger Lambda and automate any actions.
        
        RDS Event Notification is a native feature on RDS that enables you to receive notifications on various DB events, such as when the master password has changed.
        
        Enhanced Monitoring
        
        access to real-time metrics from underlying OS, access to Free Memory. Collection granularity can be adjusted down to 1 sec refresh, as opposed to the standard CW metrics that refresh only once every minute.
        
        CW retrieves its metric from the hypervisor, rather than the underlying OS. So, our view is limited to the metrics that are exposed to the hypervisor. This is why enhanced monitoring is helpful, because it gives us view into the OS metrics, as well. Enhanced Monitoring Agent, running on the managed EC2 instance hosting the DB collects the OS metrics and sends them to CW.
        
        CW shows us Freeable Memory (how much of memory could be freed up, used by cache and buffer pools), but in order to see Free Memory metric, we need Enhanced Memory. Important to track in production as we don't want to have any spills to disk that slow us down.
        
        Performance Insights - Provides helpful insights into workload performance such as db load, top queries, etc. Helpful to teams that don't have a dedicated DBA.
        
        Generates visualizations of metrics
        
        Can be enabled on creation or during the modification process. No downtime, reboot nor failover will be required.
        
        The PI agent is lightweight and consumes limited CPU and memory on the DB host.
        
        If DB load is high, the PI agent collects data less frequently
        
        PI Dashboard - contains visualizations of various load metrics and you can drill in based on a particular wait state, SQL Query, host or user.
        
        Automatically publishes insights to CloudWatch, via metrics like DBLoad, DBLoadCPU, DBLoadNonCPU
        
        PI has its own set of APIs that you can use to retrieve metrics
        
        Important Dashboard Metrics
        
        Counter Metrics - Can be viewed via Performance Insights. Monitor specific performance metrics. Varies depending on which DB engine is used. e.g. Memory, SwapSpace, AbortedConnections, etc.
        
        DB Load - compares DB load (measured in average active sessions i.e. opened connection that send a request to DB and are waiting on a response) to the maximum instance capacity. If AAS is near 1, the DB is fully utilized. If AAS is near 0, the DB is idle.
        
        Top Items - Slice the metrics by top waits, SQL, hosts and users.
        
        Instance Statuses
        
        failed, restore-error and incompatible statuses (there are 3 - network, parameters, restore) can result in severe outages. You are not billed when an instance is in incompatible or failed state.
        
        10 statuses in total
        
        If you get incompatible-parameters state, copy your parameter group, make changes and then apply that new parameter group.
        
        restore-error status indicates that PITR has failed.
      - Logs
        
        MySQL & MariaDB
        
        Binary logs used to track query activity. Configurable retention using rds_set_configuration stored procedure
        
        Types
        
        Error
        
        General_log - granular record of all activity in the DB
        
        Slow_query - capture any query that takes longer than, say, 5s to execute
        
        Audit - requires MariaDB Audit plugin
        
        Logs Output
        
        FILE
        
        Required setting if you want to interact with the logs via console
        
        by default, logs older than 24 hours are deleted
        
        If the size of the logs is greater than 2% of all storage, then RDS will continually remove the oldest logs until they take up less than 2%.
        
        TABLE
        
        stored and viewable in the engine
        
        records older than 24hrs are moved to the backup table
        
        The rotation occurs if the size of the logs is more than 20% of the storage or if the size exceeds 10GB.
        
        PostgreSQL
        
        postgres engine combines most of the logs into the postgres.log file
        
        a feature to export postgres logs to CW logs
        
        The logs are centralized in the postgres log. To enable additional logging, you just need to make changes in the parameter group
        
        log_statement - defines what query activity we want to log in the DB
        
        log_min_duration - threshold for when the query is logged.
        
        log_retention_period - configured to retain the logs for up to seven days. Measured in mins. Be careful with how much you log as it could impact DB's I/O.
        
        Oracle
        
        Types
        
        Audit
        
        Alert
        
        Trace - diagnose or resolve operational issues
        
        Listener - track connections made to the DB
        
        Supplemental Logging and Online Log Files - might be required when using log miner or when you want to track changes across multiple DBs.
        
        Force Logging - used to log all changes except of those in temporary tablespaces
        
        Most Oracle logs are retained for a default of 7 days
        
        SQL Server
        
        Types
        
        Error
        
        Trace
        
        Agent
        
        Dump
        
        Retained for 7 days
        
        Error and Agent logs can be exported to the CloudWatch Logs and then view from the RDS Console
        
        Can be viewed via console, CLI/API or via stored procedure executed on the DB engine
      - Security
        
        Access resource using Roles
        
        Assume Role - User assuming cross-account role must be identified as a principal in the role's trust policy
        
        STS GetFederationToken used to get temporary credentials
        
        Service-Linked Roles - RDS uses AWSServiceRoleForRDS role that allows it to call other AWS services. It's created for you when you create a DB instance.
        
        AWS Managed Policies
        
        AmazonRDSReadOnlyAccess
        
        AmazonRDSFullAccess
        
        External Authentication
        
        Support for Kerberos and Microsoft AD
        
        Supported only by MySQL, SQL Server, Oracle and PostgreSQL
        
        You can use AWS Directory Service for Microsoft AD or your on-prem AD. You can connect your existing AD domain to your RDS SQL Server DB using AWS Directory Service.
        
        IAM Authentication
        
        Supported by only MySQL and PostgreSQL
        
        Can be enabled during instance creation or modification
        
        MySQL limited to 200 connections/second
        
        PostgreSQL does not have limits, you just have to set SSL parameter to 1
        
        Use a generated authentication token. The tokens expire after 15mins
        
        In order for IAM Authentication to work, RDSDBConnect IAM policy needs to be assigned to the user and a user needs to be created in the DB.
        
        KMS
        
        You can't enable or disable encryption for a preexisting DB instance. Encryption must be enabled at the time of creation.
        
        Can't create an encrypted read replica if the source is not encrypted. Both the primary and all replicas must either be encrypted or unecrypted.
        
        You can't restore an encrypted snapshot to the unencrypted DB instance.
        
        Keys are region-specific. If you want to restore an encrypted snapshot from one region to another, you must have a valid KMS key in that other region that can be used.
        
        SSL/TLS Support
        
        Set rds.force_ssl parameter to 1 to force all users to use SSL in order to connect to the DB.
        
        To get a root certificate for all AWS Regions, download it from https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem
        
        RDS Subnet Groups
        
        Grouping of subnets from multiple AZs. Logical grouping that we create in our VPC.
        
        Subnet group must contain at least 2 AZs
        
        Cannot be a mix of public and private. They all gotta be either public or all private.
        
        You can modify the VPC of the database instance by modifying the subnet group.
        
        Security Groups - 3 types of security groups are used in RDS: VPC Security Groups, DB Security Groups and EC2-Classic security groups. Only the first one is relevant these days as the later 2 only apply to EC2 Classic
      - Managing RDS resources
        
        Modifying storage
        
        When storage optimization begins, the DB instance is placed in storage-optimization status. After it completes, you cannot do any other storage modifications for 6 hours.
        
        When you are adding storage, you have to add at least 10% of the existing storage
        
        You can't reduce the amount of storage using this method because RDS service doesn't know how the data is spread across the disk. So, to reduce the volume, you'd have to migrate to a DB instance with smaller volumes.
        
        Storage mods that result in outage:
        
        Any change of volume types from GP2 and IO1 to Magnetic or from Magnetic to GP2 and IO1
        
        Change from IO1 to GP2 or GP2 to IO1 in a single-AZ DB instance, but only if custom parameter group was used.
        
        No outage happens when you have Multi-AZ
        
        You can use RDS autoscaling to automatically scale your storage up. Note that you cannot scale down. Also, you should configure the maximum storage to which your db will scale to avoid having it scale to astronomical sizes.
        
        Instance Modifications
        
        Changes to instance class using "Apply Immediately" option result in outage
        
        Without Apply Immediately option, the changes are applied during the next maintenance window
        
        In Multi-AZ, the instance change is for performed on the standby instance, drastically reducing downtime. After standby mod is complete, RDS performs a failover to the secondary to promote it to become Primary. Then it applies mods to the original Primary instance.
        
        Changing Public setting is done immediately and does not result in any downtime
        
        Parameter Groups
        
        Controls engine config values
        
        Static and dynamic parameters
        
        Cannot modify default parameter group. You have to create a custom parameter group and associate it with the instance.
        
        When you modify your instance to apply a new Parameter Group, the Parameter Group status will change to "pending-reboot" status. Know that this reboot will not automatically happen during the next scheduled maintenance window. You have to do the reboot manually yourself.
        
        Option Groups
        
        Used to enable additional features, such as Maria DB audit plugin for MySQL
        
        Persistent options can't be removed from an option group while DB instances are associated with the option group.
        
        Links to VPC rather than DB instance. Cannot be migrated between VPCs.
        
        Note that some extra features are not available in Options Groups. For instance, Option Groups do not support SQL Server Reporting Services feature. If you want to use this feature, you have to host SQL Server on an EC2 instance.
        
        Permanent options, such as the TDE option for Oracle Advanced Security TDE, can never be removed from an option group.
      - Maintenance
        
        Status Types
        
        Required - must be applied. You can defer it, but not indefinitely since the RDS determined that the update is necessary.
        
        Available - update available, but not required to be applied immediately. You could also do nothing.
        
        Next Window - Upcoming maintenance will take place in the next window.
        
        In Progress - maintenance tasks are already being applied to the resource
        
        Maintenance Windows
        
        A 30min window in an 8-hour time block.
        
        If the maintenance tasks take longer than 30mins, they will still run until they finish (there is no hard cutoff).
        
        The 8hr block varies by region. If you don't specify a 30min window, it is selected at random.
        
        DB Engine Maintenance will result in downtime, even if you have Multi-AZ, since the change has to be applied to both instances at the same time. However, with Hardware and OS maintenance, Multi-AZ will help since those changes can be applied to standby instance first.
        
        Minor Version Engine Upgrades - backward compatible changes that can be automatically applied if you enable automatic version upgrades feature.
        
        Major Version Engine Upgrades - typically not backward compatible. Have to apply it via instance modification process. Per best practice, always take a backup before applying any major version upgrades.
      - Troubleshooting
        
        Parameter Changes Not Taking Effect - verify that the status of the instance is "pending-reboot". Perform the reboot of the resource via CLI and verify that the parameter group change has taken effect.
        
        Instance in Storage-Full State - Anticipate the increased requirements of storage (e.g. upcoming migration) and perform a database instance modification to increase storage allocation ahead of time. You can also configure a CW Alarm to monitor the FreeStorage metric.
        
        Replication is Stopped - It can happen if the master and the replicas are using different parameter and option groups (e.g. you could see max_allowed_packet error). To fix this, make sure you set the same values in both master and replica. If the replication has lagged a lot, just create a new read replica and take down the first one.
        
        Insufficient Capacity Error - select a different DB instance class or try to create the resource in another AZ.
        
        Too Many Connections - Increase max_connections parameter value
      - Pricing Models in RDS
        
        DB Instance Pricing (instance type and size)
        
        Storage Type
        
        General purpose (gp2) - max IOPS is 16,000
        
        Provisioned-IOPS (io1) - max IOPS is 64,000
        
        Magnetic - super cheap AND super slow
        
        Usage Type
        
        On-Demand
        
        Reserved
        
        DB Instance Storage
        
        I/O (Magnetic only)
        
        Backup Storage - Billed on automated and manual backup snapshots. Billed per GB.
        
        Data Transfer
        
        Tiered based on the amount transferred
        
        If you copy DB snapshots across regions, it will cost you.
        
        Traffic within the AZ is free
        
        Traffic between RDS and EC2 in different AZ accrues EC2 regional data transfer fees.
        
        Traffic replicated across AZs for Multi-AZ is free
        
        Licensing - customers already using Oracle, can use their license on RDS
        
        Reserved Instance pricing - license included or BYOL
        
        On-demand pricing - only BYOL supported
      - You can stop your RDS DB instance for up to 7 days to save on cost. After several days, the DB instance is automatically started to perform maintenance. You can stop your RDS instance in either single AZ or Multi-AZ. One limitation is SQL Server, which can be stopped only when in Multi-AZ.
      - Protection against the man in the middle attack
        
        Aurora MySQL: ssl-mode=require only enforces encryption. Set to "verify-full", to force encryption and verify the certificate
        
        Aurora PostgreSQL: Set ssl-mode to "verify-full", which forces encryption and verifies the the certificate. Note that ssl-mode=require would only enforce encryption
        
        SQLServer: encrypt=true; trustServerCertificate=false
        
        Oracle: Set ssl_server_dn_match to true
      - RDS Proxy - allow your applications to pool and share database connections to improve their ability to scale. RDS Proxy makes applications more resilient to database failures by automatically connecting to a standby DB instance while preserving application connections. RDS Proxy also enables you to enforce AWS Identity and Access Management (IAM) authentication for databases, and securely store credentials in AWS Secrets Manager. Fully compatible with only MySQL and PostgreSQL.
      - You can load XML data from S3 to your table in RDS by running LOAD XML FROM S3 SQL statement
    - - Columnar type
      - Leader Node vs. Worker Node
      - Data is stored in slices
      - Dense Compute vs. Dense Storage vs. RA3
      - EVEN vs. KEY vs. ALL vs. AUTO distribution style
      - Audit logging is not turned on by default in Redshift. When you turn it on, the logs are stored in S3.
- - - - Tables are of unlimited size. Table names must be unique per region. Case sensitive, can include _, - and . symbols
      - Partition Key - Primary Key. Select a column that has many different values, ideally evenly distributed. Also called hash key.
      - Composite Primary Key: a combination of primary and sort key. Also called range key
      - Each item can be up to 400 KB in size. Nested values in an item can be 32 levels deep.
      - Data Types
        
        Scalar - number, string, binary, boolean and null. Apps must encode binary values in base64-encoded format before sending them to DynamoDB
        
        Document - complex structure with nested attributes (e.g. json)
        
        List - ordered collection of values (don't have to be of the same data type)
        
        Map - unordered collection of name-value pairs (similar to json)
        
        Set - multiple scalar values of the same type - string set, number set, binary set
      - Item Collection - any group of items that have the same partition key value in a table and all of its local secondary indexes. The maximum size of any item collection is 10 GB.
    - - On-Demand Capacity
        
        Good for
        
        New tables with unknown workloads
        
        Applications with unpredictable traffic
        
        Prefer to pay as you go
        
        Characteristics - scales with demand
      - Provisioned Capacity
        
        Good for
        
        Applications with predictable traffic
        
        Applications whose traffic is consistent or ramps gradually
        
        Capacity requirements can be forecasted, helping to control costs
        
        Characteristics
        
        consistent and predictable performance
        
        Specify RCUs and WCUs
        
        Cheaper per request than On-Demand mode
        
        If you exceed the provisioned throughput, your table starts to get throttled (apps will start to get exceptions)
      - Limit for both capacity modes is 40k RCUs and 40k WCUs. You can switch between modes only once per 24 hours.
      - You can use Parallel Scans, where a multithreaded application can send request parallel scan requests with each thread specifying Segment and TotalSegments arguments
      - Note that GetItem will always be faster than Query since it takes us straight to the physical partition that has the PK that we want to pull. The most performant method when we want to pull a specific PK.
    - - DynamoDB Streams for Real-time data processing
      - Storage autoscaling - set min and max provisioned capacity
      - DAX - in-memory caching feature
      - Supports Multi-region Multi-master mode (Global tables)
    - - ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits - number of units consumed over a given time period. Use these to see how much of your provisioned capacity has been used.
      - ProvisionedReadCapacityUnits / ProvisionedWriteCapacityUnits - The total number of provisioned units for a table or an index.
      - ReadThrottleEvents - requests to DynamoDB table that exceed the provisioned read and write capacity units
      - SuccessfulRequestLatency - the lapse time for a successful request and the number of successful requests.
      - SystemErrors - requests to DDB that generate the HTTP 500 error code
      - ThrottledRequests - any event within the request exceeds the provisioned throughput limit
      - UserErrors - requests to DDB that result in HTTP 400 error
      - WriteThrottleEvents - when write throughput capacity exceeded for a table or an index
    - - States
        
        INSUFFICIENT
        
        ALARM
        
        OK
      - Key Components
        
        Metric
        
        Threshold
        
        Period
        
        Action
    - - Components
        
        HTTP Status code
        
        Exception name
        
        Error Message
      - Exponential Backoff - built into the sdk and automatically applied
      - Batch operations - tolerate single record failures within the batch. The most common error is due to throttling. Any throttled records within the batch are retried using exponential backoff
      - When provision throughput is exceeded and the requests are throttled, the client will receive a 400-level HTTP code
    - - Used as input to DDB's internal hash function that selects a partition
      - each partition holds up to 10GB of data and get 3000 RSU or 1000 WCU
      - Consistent Hashing Algorithm is used to determine which partition should go to which node
      - Sort key cannot be used on its own. We always have to use it in conjunction with the partition key.
    - - One RCU represents one strongly consistent read request per second, or 2 eventually consistent read requests, for an item up to 4KB in size. Transactional read requests require 2 RCUs for items up to 4KB in size. Eventually consistent model is the implied default method if "strong" is not specifically mentioned.
      - Even if you read a very small item, you are still consuming at least 1 RCU.
      - One WCU represents one write per second for an item up to 1 KB in size. Transactional write requests require 2 WCUs for items up to 1KB in size.
    - - Scan
        
        Returns all items and attributes in a given table
        
        Filtering results does not reduce the RCU consumption, it simply discards non-matching items after the whole table has already been scanned
        
        Eventually consistent by default, but you can enable strong consistency scans with ConsistenRead parameter
        
        Use "limit" parameter to limit the number of items scanned and reduce the capacity required
        
        Single query returns results that fit within 1MB, but we can use pagination to retrieve more than 1MB
      - Query
        
        Find items based on PK values (required attribute) and returns all items with that PK value. Query can only be executed on tables that have both PK and SK, even though SK is not a required argument in the query.
        
        Query limited to PK, PK + SK, or secondary indexes
        
        Filtering results does not reduce RCU consumption (same concept as in filters applied on scans)
        
        Eventually consistent by default
        
        Querying a partition only scans that one partition
        
        Limit and pagination are used in the same way as with scans
    - - BatchGetItem
        
        Returns attributes for multiple items from multiple tables
        
        Request using PK
        
        Returns up to 16MB of data, up to 100 items
        
        Get unprocessed items exceeding limits via UnprocessedKeys
        
        Retrieves items in parallel to minimize latency
      - BatchWriteItems
        
        Writes up to 16MB of data, up to 25 put or delete requests
        
        Get unprocessed items exceeding limits via UnprocessedItems
        
        Conditions are not supported for performance reasons
        
        Threading may be used to write items in parallel
    - - Provisioned
        
        Need to specify minimum and maximum capacity
        
        Subject to throttling
        
        Autoscaling available
        
        Lower cost per API call
        
        The first 25GB of storage is free
      - On-demand
        
        Idle tables not charged for read/write, but only for storage and backups
        
        No need to plan and specify capacity
        
        use this mode for new product launches and then convert to provisioned capacity once your application reaches predictable steady state.
        
        Even though you should not see throttling in on-demand mode, if you double your previous peak of traffic within 30mins, you may still see throttling even in on-demand mode, as dynamodb attempts to catch up with the requests.
        
        You can switch between Provisioned and On-demand mode only once every 24 hours
    - - Global Secondary Index (GSI) - An index with a partition key and a sort key that can be different from those on the base table. The primary key of a GSI can be either simple (partition key only) or composite (PK + SK). You can create GSIs any time you'd like.
      - Local Secondary Index (LSI) - An index that has the same partition key as the base table, but a different sort key. The primary key of a LSI must be composite (PK and SK). Must be created at the table creation. A table with local secondary indexes can store any number of items, as long as the total size for any one partition key value does not exceed 10 GB.
      - Index Projections - a set of attributes copied from a table into a secondary index. The PK and SK are always projected into the index.
        
        ALL - All the table attributes are projected into the index
        
        KEYS_ONLY - Only the index and primary keys are projected onto the index
        
        INCLUDE - Only the specified table attributes are projected into the index
      - Sparse Indexes - If the SK doesn't appear in every item in a table, the index is considered to be sparse since DDB writes to the index only items that have the SK present. GSIs are sparse by default since SK is optional to be specified when GSI is created. By creating sparse indexes, you can provision GSIs with lower write throughput than that of the parent table, which can save a lot of money that would otherwise be spent on storage (for no additional benefits).
      - In general, you should use GSIs rather than LSIs. The exception is when you need strong consistency, which only a LSI can provide (GSI queries only support eventual consistency)
    - - Point In Time Recovery (PITR)
        
        Continuous backup that protects you from the accidental updates or deletes
        
        You can return to any point in time in the past 35 days.
        
        DDB maintains incremental backups of your table
        
        The latest restorable timestamp is typically 5mins in the past
        
        Not enabled by default
        
        After restoring a table, you manually have to set up auto scaling policies, IAM policies, Cloudwatch metrics and alarms, Tags, Streams settings, TTL settings and PITR settings. None of this is automatically set up by DDB for you, so you have to do it.
      - On Demand
        
        Takes a full snapshot of the complete table
        
        Can be restored at any time without any impact to the table performance
        
        Consistent within seconds across thousands of partitions and retained until manually deleted.
        
        A good fit for archival and compliance purposes
        
        Operate within the same region as the source table
    - - perform atomic writes and isolated reads across multiple items and tables
      - TransactWriteItems
        
        transact up to 25 items or 4 MB
        
        Evaluate conditions and if all conditions are simultaneously true, perform write operations
      - TransactGetItems
        
        Transact up to 25 items or 4MB
        
        return a consistent, isolated snapshot of all items
      - In terms of capacity throughput, DDB performs 2 underlying reads and writes of every item in the transaction - one to prepare the transaction and one to commit it.
      - Apply transactions to items in
        
        Same PK or across PKs
        
        Same table or across tables
        
        Same region only
        
        Same account only
        
        DDB tables only (e.g. can't transact with DDB and RDS SQL DB)
      - Scale like the rest of DDB
      - Reasons for transactions failure: Precondition failure, insufficient capacity, transactional conflicts, transaction still in progress, service error, malformed request, permissions.
    - - No cost of using TTL
      - Does not consume provisioned throughput
      - Use cases: session data, Events logs, etc.
      - Expired items are typically deleted with 48hrs of expiration
      - Items are removed from LSI and GSI automatically using an eventually consistent delete operation
      - Due to the difference in time between expiration and deletion, you may sometimes get expired items. Use a filter expression to return only items where TTL expiration > current_time
      - The TTL value must be specified as a number and in unix epoch format
    - - Fully managed, highly available, in-memory cache for DDB that delivers up to 10x improvement
      - Response reduced from ms to microseconds
      - Runs within VPC with no exposure to the Internet
      - DAX Cluster is provisioned in multiple AZs for high availability. When a primary node fails, DAX fails over to a read replica and promotes it to a new primary.
      - API-compatible with DDB. Exposes the same APIs like GetItem, BatchGetItem, etc. Control Plane APIs are not exposes (e.g. CreateTable, DeleteTable, etc.)
      - Two types of cache available - an item cache and a query cache
    - - You can convert your existing DynamoDB tables to global tables or specify your table to be a global table when you are creating it.
      - Replication latency under 1s
    - - In transit - HTTPS
      - At rest - KMS used to encrypt all table data including PK, LSIs and GSIs, Streams, Global Tables, Backups and DAX Clusters
      - When creating a new table you can either choose AWS owned CMK (a key owned by DDB, which is the default option and has not charges), or AWS managed CMK (key stored in aws account and managed by KMS)
      - DDB Encryption Client - Another encryption option where you can use Java and Python libraries that let you encrypt your data prior to sending it to DDB.
    - - CloudTrail - captures only control-plane API requests (used to manage the tables)
      - For data-plane operations (e.g. GetItem, PutItem, etc.), use DynamoDB Streams as a trigger for Lambda to have it synchronously apply your logic to the changed records
- - - - MongoDB compatible
      - Query json documents and aggregate data across many json documents
      - Leverages cluster-style architecture, similar to Neptune
      - Fully managed and supports PITR
      - Storage autoscales in increments of 10GB
      - Common Commands
        
        Create a user: db.createUser({...})
        
        Assign a role to the user: db.grantRolesToUser({...})
        
        Revoke a role: db.revokeRolesFromUser({...})
      - If the primary instance exhibits high CPU utilization, but the replica instances don't, distribute read traffic across replicas via client read preference settings (for example, secondaryPreferred)
      - Use DocumentDB Profiler to examine execution times of operations performed on the cluster
      - currentOp MongoDB command can be used to list all queries that are currently executing or are blocked
- - - - A node is RAM stored in a fixed size and it's the smallest building block of ElastiCache.
      - Each node runs Memcached or Redis and has its own DNS name and port. You can select the node type with varying memory.
      - Redis shard is a subset of the cluster's keyspace, that can include a primary node and zero or more read replicas (similar concept to RDS read replicas)
      - The shards add up to form a cluster
      - Memcached
        
        Designed for basic use cases where cache is required (it's simpler than Redis)
        
        Automatic detection and recovery from cache node failures
        
        Automatic discovery of nodes within a cluster
        
        Flexible AZ placement of nodes and clusters
        
        Integration with other AWS services
      - Redis
        
        Automatic recovery from cache node failures
        
        Multi-AZ for a failed primary cluster to a read replica
        
        Supports partitioning your data across up to 90 shards
        
        More complex and feature-packed than Memcached
        
        Global Datastore for Redis feature enables read replicas to be created across multiple regions
        
        Redis Sorted Sets feature moves the computational complexity of leaderboards from your app to your Redis cluster. Each time a new element is added to the sorted set, it's reranked in real time.
        
        Replication Mode
        
        Cluster Model Disabled - always has a single shard with up to 5 read replicas nodes. Data not partitioned. Scaling approach. Needs large instances as nodes. A good solution when you have a read-heavy app performing frequent reads and having an unpredictable load.
        
        Cluster Mode Enabled - it can have up to 250 shards with 1 to 5 read replica nodes in each shard. Data is partitioned. Partitioning approach. A good solution when you have a write-heavy application performing frequent writes
        
        You can create a cluster with higher number of shards and lower number of replicas totaling up to 90 nodes per cluster. This cluster configuration can range from 90 shards and 0 replicas to 15 shards and 5 replicas, which is the maximum number of replicas allowed.
        
        The replication structure is contained within a shard (called node group in the API/CLI) which is contained within a Redis cluster.
      - Custom Parameter Group can be created and assigned to the cluster. For instance, if you want to increase the amount of reserved memory on your cluster, create a custom parameter group with memory-reserved-percent parameter set to 50 and apply the custom parameter group to the cluster.
      - To alleviate the memory load, you can create backups from a read replica
      - Caching strategies
        
        Lazy loading - loads data into the cache only when necessary (only on cache miss)
        
        Write-through - adds data or updates data in the cache whenever data is written to the database.
        
        Adding TTL - an integer value that specifies the number of seconds until the key expires.
      - To manually promote a read-replica to a primary node in cluster-mode disabled ElastiCache cluster, Multi-AZ with Automatic Failover must be disabled first
- - - - Immutable and verifiable history (can't be deleted or modified)
      - Serverless, auto-scales
      - Amazon Ion document model is supported and the documents stored in Amazon Ion format can be queries using a SQL-like language called PartiQL.
      - Just like in Aurora, data is replicated 6 times, across 3 different AZs