Please enable JavaScript.
Coggle requires JavaScript to display documents.
Database on AWS, DynamoDB, Redshift, Aurora, Database Migration Service…
-
DynamoDB
-
What is ?
is a fast and flexible NoSQL database service for all application that need consistent, single-digit millisecond latency at any scale.
-
Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad-tech, IoT, and many other applications
-
-
-
-
-
:zap:
Streams
-
-
Inserts, updates, and deletes
-
:zap:
Global Tables
Managed Multi-Master, Multi-Region Replication
-
-
-
-
-
-
-
Redshift
what is
is a fast and powerful, fully managed, petabyte scale data warehouse service in the cloud
Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to petabyte or more for $100 per terabyte per year, less than a tenth of most other data warehousing solutions
OLTP vs OLAP
Data Warehousing databases use different type of architecture both from a database perspective and infrastructure layer
-
:zap:
Advanced Compression
Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk.
Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores.
In addition, Amazon Redshift doesn't require indexes or materialized views, and so uses less space than traditional relational database systems.
When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme
-
:zap:
Backups
-
-
Redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3)
Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery
:zap:
Redshift Pricing
is priced as follows
Compute Node Hours (total number of hours you run across all your compute nodes for billing period. You are billed for 1 unit per node per hour, so a 3-node data warehouse cluster running persistently for an entire month would incur 2160 instance hour.You will not be charged for leader node hours; only compute nodes will incur charges )
-
Data transfer (only within a VPC, not outside it)
-
-
Aurora
what is
is a MySQL and PostgreSQL -compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases
provides up to five times better performance than MySQL and three times better than PostgreSQL databases at a much lower price point, whilst delivering similar performance and availability
-
:zap:
Scaling Aurora
Aurora is designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability
Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically
-
:zap:
Backups with aurora
Automated backups are always enabled on Amazon Aurora DB Instances. Backups do not impact database performance
-
-
:zap:
Amazon Aurora Serverless
is an on-demand, auto scaling configuration for the MySQL-compatible and PostgreSQL compatible editions of Amazon Aurora.
An Aurora Serverless DB cluster automatically starts up, shuts down, and scales capacity up or down based on your application's needs
Provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
-
Databases 101
-
Relational databases
are what most of us are all used to. They have been around since the 70's. Think of a traditional spreadsheet
-
-
-
-
-
-
Data Warehousing
what is ?
Used for business intelligence. Tools like Cognos, Jaspersoft, SQL Server Reporting Services, Oracle Hyperion, SAP Business Warehourse
Used to pull in very large and complex data sets. Usually used by management to do queries on data (such as current performance vs target etc)
OLTP vs OLAP
Online Transaction Processing (OLTP) differs from OLAP Online Analytics Processing (OLAP) in terms of types of queries you will run
Data Warehousing databases use different type of architecture both from a database perspective and infrastructure layer
ElastiCache
is web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases
-
RDS
-
A Read Replica
what is
-
This is achieved by using Asynchronous replication from the primary RDS instance to the read replica
-
-
-
Backups
types of
Automated Backups
:zap:
What is
-
The retention period can be between one and 35 days. Automated Backups will take a full daily snapshot and will also store transactions logs throughout the day
When you do a recovery, AWS will first choose the most recent daily back up, and then apply transaction logs relevant to that day.
This allows you to do a point in time recovery down to a second, within the retention period
:biohazard_sign:
Are enabled by default. The backup data is stored in S3 and you get free storage space equal to the size of your database. So if you have an RDS instance of 10 GB, you will get 10GB worth of storage
:zap:
Backups are taken within a defined window. During the backup window, storage I/O may be suspended while your data is being backed up and you may experience elevated latency
Database Snapshots
:zap:
What is
are done manually (ie they are user initiated) They are stored even after you delete the original RDS instance, unlike automated backups
Whenever you restore either an Automatic Backup or a manual Snapshot, the restored version of the database will be a new RDS instance with a new DNS endpoint
:zap:
Encryption At Rest
What is
is supported for MySQL, Oracle, SQL Server, PostgreSQL, MariaDB & Aurora.
-
Once your RDS instance is encrypted, the data stored at rest in the underlying storage is encrypted, as are its automated backups, read replicas, and snapshots
Multi-AZ
What is
-
AWS handles the replication for you, so when your production databases is written to, this write will automatically be synchronized to the stand by database
In the event of planned database maintenance, DB Instance failure, or an Availability Zone failure, Amazon RDS will automatically failover to the standby so that database operations can resume quickly without administrative intervention.
-
EMR Overview
what is
is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.
With EMR, you can run petabyte-scale analysis at less than haft the cost of traditional on-premises solutions and over three times faster than standard Apache Spark
:zap:
The central component of Amazon EMR is the cluster. A cluster is a collection of Amazon Elastic Compute Cloud (Amazon EC2) instances.
Each instance in the cluster is called a node. Each node has a role within the cluster, referred to as the node type
:zap:
Amazon EMR also installs different software components on each node types, giving each node a role in as distributed application like Apache Hadoop
Exam Tips
-
Consists of a master node, a core node, and (optionally) a task node
By default, log data is stored on the master node
You can configure replication to S3 on five-minute intervals for all log data from the master node; however, this can only be configured when creating the cluster for the first time
-
ElastiCache
what is
is a web service that makes it easy to deploy, operate, and scale a in-memory cache in the cloud.
The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of replying entirely on slower disk-based databases
-
-