AWS SAA

Migration

Snowball

You're sending disks to Amazon and they deploy it to S3 or sth similar (or reverse - they will write sth to the disk)

DMS = Database Migration Service

migrate your production db without any downtime

SMS=Server Migration Service

replicates production VMs to AWS

Analytics

Athena

Allows for SQL queries to S3

EMR=Elastic MapReduce

Cloud Search

Elastic search

Kinesis

analyzing in real time big data

Data Pipeline

Move data from one place to another

Quick Sight

for business analytics, for visualizing analysis results

Storage

Storage Gateway

A virtual machine installed in your data center, which connects S3 with your data center

Glacier

Archive from S3 - you cannot get the objects immadietely

S3 = Simple Storage Service

EFS = Elastic File System

It can be shared between many VMs

Databases

RDS = Relational Database Service

DynamoDB

NoSQL db

Redshift

Data warehouse solution

Elasticache

caching top objects to improve availability

Compute

Lambda

it's actually 'serverless'

Elastic Beanstalk

you're uploading a code and beanstalk automatically creates an environment for it

EC2 = Elastic Compute Cloud

Lightsail

Networking and content delivery

Route53

DNS in AWS

Cloud Front

Distributes content via Amazon's CDN (low latency and high data transfer rates) from edge locations

VPC=Virtual Private Cloud

Virtual data center

Direct connect

your physical data centers are connected using dedicated telephone line (e.g. security reasons, higher reliability, lower costs if you're sending lots volumes of data, gives you higher network bandwidth (1 or 10 GB/s))

Security & Identity

IAM = Identity Access Management

Inspector

You're installing inspector agent on your virtual machine and it reports you audit results

Certificate manager

free ssl certificate for your domains

Directory service

WAF= Web Application Firewall

Artifacts

This is where you get your documentation (e.g. ISO compliance certificate) in AWS console

Management tools

Cloud watch

monitor your AWS environment

Cloud formation

Turns your infrastructure into code. In normal architecture you have switches, firewalls, services and so on and in cloud you have a document describing all those components. Responsible for it is Cloud Formation. You can deploy whole production environment using Cloud Formation templates.

Cloud trail

Audtiting your AWS activity. If something is changed (e.g. new service added/removed) the Cloud Trail is responsible for logging this information. Can be stored in S3 bucket

Config

Monitors your configuration and warns you if specific configuration can broke your environment that you set

Service catalog

You can specify which services are authorised among your enterprise and which don't

Trusted Advisor

Gives you recommendations, e.g. how to do cost optimization, performance optimization or security fixes in your environment.

Application Services

Step Function

SWF = Simple WorkFlow Service

Coordinates work across application's components. Allows you automating actions in your environment.

API Gateway

Doors to your backend. E.g. you're sending a request over API to Lambda service and get response.

AppStream

For streaming desktop applications

Elastic Transcoder

converts video to any format

Developer tools

CodeCommit

GitHub in AWS

Code Build

For compiling your code

Code Deploy

For deploying your code into EC2 instances

Code Pipeline

For tracking versions of your code

Mobile Services

Mobile Hub

Allows to design mobile features, e.g data storage, way of authentication. It has its own mobile console

Cognito

Simplify sign in and sign up - aloows to do it over 3rd parties (Social Identity Providers), e.g. Google. You have to give Google credenttials which are stored in Cognito and then you can log in over Cognito in other services (which allows for it)

Device Farm

Allows testing your app on hundreds of real devices.

Mobile Analytics

Analysis usage of mobile data

Pinpoint

it's like google analytics for mobile apps. Helps you understand user's behaviour.

Business Productivity

WorkDocs

For storing work documents in a secure way. Similar to S3 with some additional security features.

Workmail

It's like Microsoft Exchange in AWS.

Internet of Things

IoT

keeping track of all your IoT devices

Desktop and App Streaming

Workspaces

basically it's a VDI. It's like having a desktop in a cloud.

AppStream 2.0

Way of streaming desktop applications

Aritificial Intelligence

Polly

converts notes to mp3

Machine learning

Give data sets and then it can be used in future. E.g. for people profiling.

Rekognition

As an input you give it a picture and it can recognize objects

Messaging

SNS = Simple Notification Service

Sends email or text message to you when specified event occured

SQS = Simple Queue Service

Creates job queue. E.g. doing a mem can be a job, so once you uploaded a photo the ec2 is adding there some funny text. Even if the ec2 dies, the job is still kept in queue so when new ec2 appears it can take this job and creates a meme.

Object based storage

each object (flat file) is limited to 5TB

Storage space is unlimited

data is stored in buckets

each bucket has unique name

read after write consistency for PUTS of new objects (objects are accessible immediately after uploading)

Eventual consistency for overwrite PUTS and DELETES (after modifying or deleting file the changes can take some time to propagate)

each object is build from: key (name), value (data), ID (important for versioning), metadata (e.g. date of uploading) and subresources

subresources contain: ACLs and torrent (it supports BitTorrent protocol)

Storage Tiers

availability = 99,99%, durability = 99,999999999%

data survives even if 2 concurrent facilities will be down

S3 IA (Infrequently Accessed)

lower fee than S3

Reduced Redundancy Storage

availability = 99,99%, durability = 99,99%

cheaper than S3

Glacier

cheap but takes 3-5 hours to retrieve data

availability = 99,9%, durability = 99,999999999%

Versioning - once you enable it you cannot disable it (you can only suspend it)

Careful with big files, because all versions are stored, what can take a lot of your storage space

It may be used as a backup tool - once you deleted the object, you can steal restore it by deleting the "delete marker"

Cross-Region Replication replicates only NEW objects

objects are replicated with their permissions

If you delete an object then the "delete marker" is replicated, however if you delete the "delete marker" in your source bucket (restore the object ) then "delete marker" is not replicated (in your destination bucket the object is still deleted)

When you restore previous version (delete latest version) in your source bucket then it is NOT replicated (in destination bucket it is still in Latest version!)

versioning has to be enabled on both source and destination buckets

you cannot replicate to multiple buckets

Lifecycle management - allows you to setup rules to move/expire objects to Glacier or IA storage after some period (or some old versions of those objects)

actions on current versions

actions on previous versions

You will pay for storing objects at least for 90 days

when object expires it means there is removed the delete marker (however you're still able to restore the object)

not accessible from Singapore and South America regions

minimum object size is 128KB

what means that smaller objects will be charged as 128KB

edge location - place from where the content is served

origin - source of the content which will be cached

Distribution - name given to CDN which consists of collection of edge locations

the origin actually may be your non AWS server - CloudFront will still work

it isn't only for reading content. You can write to it also.

Objects are cached for TTL

you can cache a web content or streaming RTMP

when you PUT a new object to edge location then the edge location it will update your bucket

TTL is always in seconds

you set it up using "Default TTL"

using "Path Pattern" you can set the regular expression what should be cached, e.g. *.pdf

You can set up multiple restrictions

You can delete the distribution by firstly disabling it and then deleting

the distribution name is: [random string].cloudfront.net

WAF

you can whitelist/blacklist Geo-Restrictions

Pre-signed URLs and Pre-signed cookies

encryption

In-transit - SSL/TLS

at rest

Server Side Encryption

S3 managed keys = SSE-S3

AWS KMS - Key Management Service

logs information who is encrypting/decrypting what

uses another key to encrypt your encryption key

Server side encryption with customer provided keys - SSE-C

Client side encryption

4 types of Storage Gateway

2) File Gateway (NFS)

Volumes Gateway (iSCSI) - the block based storage (it's like virtual hard drive - you can install on it applications). Data is stored on volumes as Amazon EBS snapshots (limited in size 1 GB-16 TB).

3) Stored Volumes

4) Cached Volumes

1) Gateway Virtual Tape Library (VTL)

you're doing a full copy locally and then it is asynchronously backed up (do a EBS snapshot and store it in S3).

allows you storing your tape data to virtual cartridges

nothing is stored locally

entire data set is stored in S3 and the most frequently accessed data is cached locally

3 main types

(normal) Snowball

Snowball Edge

Snomobile

80 TB disk space

100 TB disk space

gives you also the compute capabilities. For example, the airplane engineer takes the snowball edge's box on the board and it's mounted as a disk. During the flight data about engines is collected and then send to Amzaon data centre. In the result you have in your cloud not only the data, but also the Lambda function.

Looks the same as normal snowball

up to 100 PB

before snowball there was an import export service where users were sending their own disks, but it was hard to manage for AWS

requires to install a client software on your machine

you have to download the manifest file and provide the access code from your AWS console to run the Snowball

To speed up the transfer of uploading you may use Transfer Acceleration service

it uses the CloudFront Edge Network

to upload you're using dedicated URL, e.g. rzepsky.s3-accelerate.amazonaws.com

you may use pre-signed URLs in the following scenario. You have a website with photos (stored on s3) and you don't want to share directly the photos but rather you want people to visit your website and see the pictures there (e.g. because of displaying ads to people). With normal s3 storage other websites can directly link to photos without reaching your website. If you use pre-signed urls then it is impossible to directly refer to photo. People can only see it on your website.

EC2 price options

on Demand

pay fixed rate by hour (or on seconds for Linux)

Reserved

capacity reservation. Requires signing a contract for 1-3 years

Spot

you need to set a bid price (a maximum price that you can spend for an hour/second of using EC2). If there is high demand for EC2 (a lot of people is buying it at the moment) then the price of it is going up. If this dynamic price is above your bind your instances will be stopped or terminated.

Dedicated hosts

It may be helpful if the licensing agreement requires it (e.g. an Oracle db) or for the government

can be bought as on Demand or as Reserved

EBS - virtual storage disk which you attach to your EC2 instance. It is a block storage so it allows you to install components on it (just like on your PC's HDD)

EBS types

GP2 - General Purpose SSD. Balanced price and performance.

IO1 - Provisioned IOPS SSD

3 IOPS per GB up to 10000 IOPS

more than 10000 IOPS

ST1 - Throughput Optimized HDD

for large amount sequential data (data warehouse/log processing/Big Data

SC1 - Cold HDD

Lowest cost storage for infrequently accessed workloads (e.g. as a file server,

Magnetic standard

Lowest price per GB

If you terminate the instance then you're going to pay for this hour. If the AWS terminates your instance then it is for free

You cannot mount 1 EBS to 2 EC2. Use EFS instead

Configuring

For each availability zone there is a separated subnet (1 subnet = 1 availability zone).

Only SSD and Magnetic disks can be bootable. The HDD ones CANNOT be root disks (but can be mounted additionally)

by default the EC2 and its disks are deleted when terminated (but you can change this behaviour)

remember about the tagging - thanks to tags you're able to track particular services which generate the costs.

a virtual machine in the cloud

RAID = Redundant Array of Independent Disks

RAID 0 - striped, no redundancy

RAID 1 - Mirrored

RAID 5 - at least 3 disks, good for reads and bad for writes; not recommended by AWS

RAID 10 - combination of RAID 0 and 1; good performance and redundancy

from multiple EBS volumes you can create a RAID (the EBSes can be of different types), e.g. from 4 different volumes create one striped partition D://

Taking a snapshot of RAID

normally taking a snapshot excludes application and OS cached data, however in RAID it can be a problem due to interdependencies of the array

To take a snapshot of RAID array you have to stop app from writing and flush all caches to the disk

You can do this using one of the following method: freeze filesystem, unmount the RAID array, shut down associated EC2 instance

Create the encrypted snapshot

1) Stop an instance and create a snapshot
2) Copy a snapshot to different region and enable encryption
2) Create an image (AMI) from this snapshot

Snapshots of encrypted volumes are encrypted automatically

You can share snapshots only when they are NOT encrypted

AMI - Amazon Machine Image

The Storage for the Root Device (Root Device Volume) can use either EBS (most common) or instance store (you CANNOT stop it - only reboot or terminate)

Rebooting EBS or instance store backed AMI will NOT loose your data

Terminating deletes EBS or instance store volumes HOWEVER with EBS volumes you can tell AWS to keep the root device volume

Instance store - the root device for an instance launched from the AMI is an instance store volume created from a template stored in Amazon S3 (slower and uses ephemeral storage)

Amazon EBS - the root device for an instance launched from the AMI is an Amazon EBS volume created from an Amazon EBS snapshot (they are faster and uses persistent storage)

Load Balancers

2 types

Classic Load Balancer (works in layer 4)

Application load balancer (works in layer 7)

Network Load Balancer

Require to configure health checks - LB passes traffic only to instances which pass the healthcheck

Default EC2 metrics

CPU related,
Disk related,
Network related,
status related

CloudWatch Events, e.g. a rule to update DNS when event is triggered

Logs - require installing the Agent

Standard monitoring = 5 minutes

Detailed monitoring = 1 minute

Cloud watch is for monitoring (performance), while CloudTrail is for auditing (what people are doing on your resources)

CLI

for some regions the '--region' parameter has to be explicitly specified and for some not. It is better than to use always this parameter so it will always work

Metadata

accessible under the address: http://169.254.169.254/latest/meta-data/

Launch Configuration

in the auto scaling group you define the amount of instances and the subnets (each subnet is a separate availability zone) - the more subnets the bigger redundancy you have.

In advanced settings you can specify the load balancer which does a health check

Grace period is a length of time before Auto Scaling does a health check

allows you specify increase and decrease group sizes - e.g. when the CPU > 90% add an instance and when CPU < 40% remove one instance

Auto Scaling automatically re-provision instances, e.g. when you terminate one of your instance it will be automatically (after a while) bring back

removing Auto Scaling Group also removes the instances specified in Launch Configuration

EC2 Placement Group

It's a logical grouping of instances within a single Availability Zone. It is recommended for applications that need very low latency and very high network throughput (10Gb/s), e.g. clusters.

Placement Group cannot span across multiple Availability Zones.

only certain instances can be launched in a Placement Group.

You cannot merge Placement Groups

You cannot move existing instance into Placement Group

However you can create an AMI from existing instance and then launch a new instance from AMI into a Placement Group

The name of Placement Group must be unique

It elastically grow (can scale up to petabytes) and shrinks when you add/remove files

it supports NFSv4

you pay only for the storage you use

it's a block based storage

read after write consistency

EC2 instances have to be in the same security groups as the elastic file system

To mount it simply run the proper command on each instance.

it's a compute service. You upload a code and Lambda takes care of provisioning and managing the servers to run your code.

The code is run when an event is trigerred

the event can be for example new file in S3 or an HTTP request (send to API Gateway and then it triggers Lambda function)

each request deploys a new Lambda function

You have to grant permissions to role assigned to your function, e.g. Simple Microservice permissions (without permissions it will not work)

You cannot remove snapshot of an EBS volume which is used as a root device for AMI - you have to firstly deregistered AMI and then you can remove the root device

A records maps domain name to IP

CNAME records map 1 domain name to another, e.g. http://m.example.com to http://mobile.example.com

Alias is similar to CNAME but it allows for mapping the naked domains e.g. http://example.com, while CNAME not. ELB doesn't resolve to IP address (always to domain name) so Alias here are very helpful

Amazon charges you for CNAME record, but not for Alias requests

Routing Policy

Simple

Weighted

Latency

Failover

Geolocation

default one

good if you have single resource, e.g. 1 web server

e.g. 20% goes to server 1 and 80% to server 2

allows you to route traffic based on the lowest network latency

detects a failover and redirects traffic to secondary server

requires creation of health check

based on the geographic location (continents, countries or only in US states)

it's helpful when you want for example present one version of a website for all Europeans, and the other version for all Africans

By default you can have registered 50 domains, however if you need more you can have after contacting Amazon support

collection = table
document = row
key value pairs = field

When to use what?

OLTP (Online Transaction Processing

OLAP (Online Analytics Processing) - it pulls a large amount of records; used for data warehousing

E.g. pull up a row where order_id=159

E.g. you want to count net income of EMEA and US region, then you need to have sum of :sold products from EMEA and US, unit cost of each product, sales price of each product etc.

uses 2 open source caching engines

Memcached

Redis

DMS - Database Migration Service

e.g. Oracle licensing is very expensive so you can convert your Oracle db to free MySQL

for OLTP

supports:
SQL,
MySQL,
Oracle,
Postgresql,
Aurora,
MariaDB


for OLAP

Backups

Automated backups

Database snapshot

It takes a daily db snapshot and stores transaction logs through all day. This allows you to recover a db to any point in time within retention period (it can be from 1 to 35 days)

Enabled by default

backups are taken in a defined time window and during a backup you may experience latency

The backup is stored on automatically created S3 bucket. If your RDS db is 10 GB big then there is created for free S3 10 GB bucket

Unlike automated backups they are stored EVEN AFTER deleting RDS instance

Restored instance is always a NEW RDS instance with a new endpoint

Multi - AZ

It allows you to to have exact copy of production db in different Availability Zone. Both dbs are automatically synchronised so if one fails then another one comes live.

if there's a planned mainenance window then RDS will automatically switch and then come live again without administrator's action

Multi-AZ is only for Disaster Recovery not for improving performance (for this you need Read Replicas)

Read Replica

you have multiple dbs which are synchronised with each other improving performance (e.g. each instance of a web server is connected with an RDS instance)

The other possible purpose is to take a replica of your production for dev purposes

Supported only by MySQL, PosgreSQL, MariaDB

It requires enabled automatic backup

You can have up to 5 replicas of db

Read replicas of read replicas are possible but may cause latency problems , because of asynchronous algorithm of synchronization

Read Replicas CANNOT have Multi-AZ, HOWEVER you can have read replicas of Multi-AZ source dbs

Each replica have its own DNS endpoint

Read replicas can become normal dbs (with write permission)

You can have read replica in a 2nd region however ONLY for MySQL and MariaDB

to scale RDS (increase the size) you have to manually restore a snapshot with bigger size or to add a read replica. DynamoDB has a scale "push button" and the process of scaling is automatic

Allows automatic scaling up process without any down time (in opposite to RDS)

good for gaming, IoT, etc.

Stored on SSD

Automatically spreads across 3 geographically different data centres (user cannot specify which specific AZs should be used)

by default it uses Eventual Consistent Reads (consistency across all copies is reached within a 1 second)

Pricing

Strongly Consistent Reads

Write throughput (per 10 units); expensive for writes

Read throughput (per 50 units); cheap for reads

Storage costs (per GB)

click to edit

Configuration

single node = 160 GB

you can use multi-node

Leader node (receives queries)

compute nodes (store and data and compute queries)

can be up to 128 compute nodes

is fast

uses Columnar Data Storage - processing only columns which are involved a query

good for aggregated queries

Advanced Compression - Columnar Data Can be compressed more effectively than row based data (it stores same type of data)

Massively Parallel Processing - automatically distributes data across all nodes

Pricing

is based on Compute Node Hours = 1 unit per node per hour

charged only for compute nodes

you are charged for backup

you are charged for data transfer within VPC

Security

in transit SSL

at rest AES 256

by default Redshift takes care of key management (but you can change it to AWS KMS or manage keys through HSM)

Available only in 1 AZ

Aurora - MySQL compatible (but provides up to 5 times better performance)

2 copies of your data in each AZ, with minimum of 3 AZs = 6 copies of your data

self-healing - automatically scans disks for errors and repairs them

Replicas

Aurora replicas (up to 15) - provide automatic failover

MySQL replicas (up to 5) - NOT provide automatic failover

the maximum provisioned IOPS capacity on an Oracle and MySQL RDS instance (using provisioned IOPS) is 30,000 IOPS

the maximum size RDS volume you can have by default using Amazon RDS Provisioned IOPS storage with MySQL and Oracle database engines is 6 TB

you can have only one virtual gateway per VPC

VPC peering = connecting 2 VPCs via a direct link using private IP address (something like vlans - vpc behaves as they are in the same network)

There's NO transmitive peering - VPCs can communicate only with directly connected VPCs

Configuration

Security Groups are stateful, but ACLs are stateless

Subnets

1 subnet = 1 availability zone

first 4 IPs and the last from the subnet (5 in all) cannot be used

by default each newly created subnet is asociated with the main route table

by default newly created subnets has no auto-assign public IP - you have to turn it on manually if you want to access resources in this subnet from public

Security groups dont't span across VPCs (if you created a security group in VPC1 it will not be visible in ~VPC2)

Source/destination checks - that means that each instance has to be either a source or destination. This is set by default. However you have to change it if you want to use NAT

NAT

NAT Gateway - for IPv4

Egress Only Internet Access - for IPv6

You should have 1 gateway per availability zone

You can use EC2 instance as a NAT GW too.

If you are bottlenecking, increase the instance size

Network ACL

created by default (allows all in and out traffic). But if you create your own one it has a rule DENY everything.

you can associate a subnet to only ONE Network ACL

Remember to specify a 'Custom TCP' outbound rule with a port range (it is required for ephemeral ports - the ports open on client side, e.g. on client side you're using 1050 as ephemeral port and 80 as a destination port)

you can block addresses using ACLs (NOT via Security Groups)

previous generation - forget about it

Rare choice, only if you need static IP and ultra high performance

Successor of previous classic load balancer

Load balancer in VPC

requires at least 2 public subnets

VPC Flow Logs

capture logs about the traffic

those logs can be viewed in CloudWatch Logs

Can be created on 3 levels

VPC

Subnet

Network Interface Level

You can stream captured logs e.g. to Lambda and react for the traffic

Flow logs between VPCs are ONLY possible if both VPCs are under ONE account

you cannot tag a flow log

After creating a flow log you cannot change its configuration, e.g. you cannot assign a new IAM role

Not all traffic is monitored. Exceptions are: Amazon DNS traffic (however if you have your own DNS server then it is logged), traffic from Windows instance for an Amazon Windows license activation, traffic to and from 169.254.169.254, DHCP traffic, traffic to reserved IP for the default VPC router

VPC endpoints

2 types

Elastic Network Interface

Gateway

works as a NAT gateway

accessing any AWS Service through VPC endpoint within one region is done using private IPs

they're used for accessing other AWS services behind the gateway (like to the different department, but in the same enterprise network)

NAT gateway is not associated with security groups (while NAT instances are)

By default you can have 5 VPCs in one region and 200 subnets per VPC

SES = Simple Email Service

Sends and receives emails.

You can put auto-scaling to your SQS. Once x jobs is sitting in the queue, then there will be launched automatically a new ec2 instance to handle new jobs

SQS is a pull-based system

Keeping jobs in a queue (independently from other system components) resolves the issue that arises if the producer is producing work faster than the consumer can process it.

2 types of queues

standard (default)

FIFO queues

generally works as fifo but occasionally some messages can be delivered out of order (but the deliver is guaranteed)

guarantees the order and deliver

limited to 300 transactions per second

message size = 256 KB

messages can be kept in a queue from 1 minute to 14 days (default is 4 days)

Visibility Time Out - amount of time that the message is invisible in the queue (taken message from the queue becomes invisible - if processed then deleted if not then become visible again). Maximum VTO is 12 hours.

long polling doesn't returns a response until a new message appears in queue (in opposite to short polling)

Differences between SWF and SQS

Retention: SQS has a retention 14 days and SWF up to 1 year

Orientation: SQS has a message-oriented API while SWF has task-oriented API (e.g. human involving tasks)

Duplicates: SWF ensures that task is NEVER duplicated, while duplicates in SQS are possible

tracking: SWF tracks all tasks and events in application, while in SQS you have to implement your own application-level tracking

SWF Actors

Workflow starters - an app that can start a workflow

Deciders - control the flow ( if the task is finished or fails they decide what to do next

Activity workers - carry out the activity tasks

allows you to group multiple recipients using topics, e.g. you can group iOS and Android recipients and when you publish once to this topic SNS delivers appropriately formatted copies to each subscriber

Message in SNS are stored across multiple availability zones

SNS is push-based

API caching (caching response for a specified time-to-live (TTL)

If you face problems with origin policy -> enable CORS (Cross-Origin Resource Sharing) on API Gateway

streaming data - data send continuously from many data sources (e.g. geospatial data like in uber app, or gaming)

Kinesis Services

Kinesis Streams

Kinesis Firehose

Kinesis Analytics

Data is stored in Shards

Data are stored from 24 hours to 7 days

Producers puts data in Shards

Retention: data is stored from 24 hours (default) to 7 days

then they are taken by consumers

data is not stored but automatically processed (there's no consumers), e.g. by Lambda

click to edit

Domain in SWF is a collection of related workflows

if you want to serve images from CloudFront rather than from S3 via Apache on EC2, you have to change a setting in httpd.conf to AllowOverwrite All

you can copy whole folder either using 'cp --recursive' or 'sync'. Sync copies ONLY new files. If you want to also include removing files from the destination then use 'sync --delete'

Additional exam tips:

3 ways to implement elasticity

Proactive Cyclic Scaling - periodic scaling after fixed time interval (e.g. daily)

Proactive Event-based Scaling - scale when an event occurs (release of your product)

Auto-scaling

Amazon currently doesn't support increasing storage on SQL Server Db instance

Key words

streaming large data -> Kinesis

Business Intelligence -> Redshift

Big Data processing -> Elastic Map Reduce

OpsWork

Orchestration Service that uses Chef

Resource groups - allows to group resources using tags assigned to them

"chef", "recipes", "cookbook" -> OpsWork

AWS Organizations - account management service which allows you to consolidate multiple AWS accounts (by default up to 20; for more you have to contact Amazon) into an organization

available in 2 feature sets

Enable only Consolidated Billing

Enable All Features

Cloud Trail works per AWS account and is enabled per region. However you can consolidate logs using S3 bucket (it requires cross account access policy and enabling CloudTrail on each account)

allows you to get discounts (you use 600 GB instead of 2 accounts per 300 GB), better management

you can manage users by consolidating them into groups or individually assign policies.

you can connect VPCs owned by different accounts however only in the same region

uses Ethernet VLAN Trunking (802.1Q)

STS = Security Token Service

Using STS you can log in to AWS with your AD account/OpenID providers without IAM credentials.

terms

Federation - combining list of users in 1 domain (e.g. IAM users) with list of users from 2 domain (e.g. AD)

Identity Broker - a service to take an identity from point A and join it (federate it) to point B

Identity store - service like AD

Identities - users of a service, like AD account

ECS = Elastic Container Service

Manages Docker containers on a cluster of EC2 instances

can be used to deploy sophisticated applications on a microservices model

ECR = EC2 Container Registry

it's a managed AWS Docker registry service. It supports private Docker repositories

Docker container are run based on JSON Task definitions

Allows you to root access

AMI = Amazon Machine Image

2 types of virtualization

PV - Para Virtualization

HVM - Hardware Virtual Machine

to maximize IOPS performance the best strategy is to add multiple additional volumes with provisioned IOPS and then create a RAID0 stripe across those volumes

it is possible to transfer a reserved instance from one AZ to another

uses 1024KB (1MB) block size for its columnar storage

Used for analysing big data

You can shorten time for processing an EMR job by reducing the input split size in the MapReduce job configuration and then adjust the number of simultaneous mapper tasks so that more tasks can be processed at once

public subnet has at least one route that uses an internet gateway in it's route table.

There is a soft limit for 20 EC2 instances per region

EC2 instance in a public subnet is only publicly accessible if it has a public ip address or is behind an elastic load balancer.

when using multiple availability zones, you cannot use the secondary database as an independent read node

Once a VPC is set to Dedicated hosting, it is not possible to change the VPC or the instances to Default hosting. You must re-create the VPC.

​VPC peering does not support edge to edge routing.​

DynamoDB allows for the storage of large text and binary objects, but there is a limit of 400 KB.