AWS SAA
Migration
Snowball
You're sending disks to Amazon and they deploy it to S3 or sth similar (or reverse - they will write sth to the disk)
DMS = Database Migration Service
migrate your production db without any downtime
SMS=Server Migration Service
replicates production VMs to AWS
Analytics
Athena
Allows for SQL queries to S3
EMR=Elastic MapReduce
Cloud Search
Elastic search
Kinesis
analyzing in real time big data
Data Pipeline
Move data from one place to another
Quick Sight
for business analytics, for visualizing analysis results
Storage
Storage Gateway
A virtual machine installed in your data center, which connects S3 with your data center
Glacier
Archive from S3 - you cannot get the objects immadietely
S3 = Simple Storage Service
EFS = Elastic File System
It can be shared between many VMs
Databases
RDS = Relational Database Service
DynamoDB
NoSQL db
Redshift
Data warehouse solution
Elasticache
caching top objects to improve availability
Compute
Lambda
it's actually 'serverless'
Elastic Beanstalk
you're uploading a code and beanstalk automatically creates an environment for it
EC2 = Elastic Compute Cloud
Lightsail
Networking and content delivery
Route53
DNS in AWS
Cloud Front
Distributes content via Amazon's CDN (low latency and high data transfer rates) from edge locations
VPC=Virtual Private Cloud
Virtual data center
Direct connect
your physical data centers are connected using dedicated telephone line (e.g. security reasons, higher reliability, lower costs if you're sending lots volumes of data, gives you higher network bandwidth (1 or 10 GB/s))
Security & Identity
IAM = Identity Access Management
Inspector
You're installing inspector agent on your virtual machine and it reports you audit results
Certificate manager
free ssl certificate for your domains
Directory service
WAF= Web Application Firewall
Artifacts
This is where you get your documentation (e.g. ISO compliance certificate) in AWS console
Management tools
Cloud watch
monitor your AWS environment
Cloud formation
Turns your infrastructure into code. In normal architecture you have switches, firewalls, services and so on and in cloud you have a document describing all those components. Responsible for it is Cloud Formation. You can deploy whole production environment using Cloud Formation templates.
Cloud trail
Audtiting your AWS activity. If something is changed (e.g. new service added/removed) the Cloud Trail is responsible for logging this information. Can be stored in S3 bucket
Config
Monitors your configuration and warns you if specific configuration can broke your environment that you set
Service catalog
You can specify which services are authorised among your enterprise and which don't
Trusted Advisor
Gives you recommendations, e.g. how to do cost optimization, performance optimization or security fixes in your environment.
Application Services
Step Function
SWF = Simple WorkFlow Service
Coordinates work across application's components. Allows you automating actions in your environment.
API Gateway
Doors to your backend. E.g. you're sending a request over API to Lambda service and get response.
AppStream
For streaming desktop applications
Elastic Transcoder
converts video to any format
Developer tools
CodeCommit
GitHub in AWS
Code Build
For compiling your code
Code Deploy
For deploying your code into EC2 instances
Code Pipeline
For tracking versions of your code
Mobile Services
Mobile Hub
Allows to design mobile features, e.g data storage, way of authentication. It has its own mobile console
Cognito
Simplify sign in and sign up - aloows to do it over 3rd parties (Social Identity Providers), e.g. Google. You have to give Google credenttials which are stored in Cognito and then you can log in over Cognito in other services (which allows for it)
Device Farm
Allows testing your app on hundreds of real devices.
Mobile Analytics
Analysis usage of mobile data
Pinpoint
it's like google analytics for mobile apps. Helps you understand user's behaviour.
Business Productivity
WorkDocs
For storing work documents in a secure way. Similar to S3 with some additional security features.
Workmail
It's like Microsoft Exchange in AWS.
Internet of Things
IoT
keeping track of all your IoT devices
Desktop and App Streaming
Workspaces
basically it's a VDI. It's like having a desktop in a cloud.
AppStream 2.0
Way of streaming desktop applications
Aritificial Intelligence
Polly
converts notes to mp3
Machine learning
Give data sets and then it can be used in future. E.g. for people profiling.
Rekognition
As an input you give it a picture and it can recognize objects
Messaging
SNS = Simple Notification Service
Sends email or text message to you when specified event occured
SQS = Simple Queue Service
Creates job queue. E.g. doing a mem can be a job, so once you uploaded a photo the ec2 is adding there some funny text. Even if the ec2 dies, the job is still kept in queue so when new ec2 appears it can take this job and creates a meme.
Object based storage
each object (flat file) is limited to 5TB
Storage space is unlimited
data is stored in buckets
each bucket has unique name
each bucket can be reached using link: https://s3-eu-west-1.amazonaws.com/[bucketname] or: https://[bucketname].s3.amazonaws.com
read after write consistency for PUTS of new objects (objects are accessible immediately after uploading)
Eventual consistency for overwrite PUTS and DELETES (after modifying or deleting file the changes can take some time to propagate)
each object is build from: key (name), value (data), ID (important for versioning), metadata (e.g. date of uploading) and subresources
subresources contain: ACLs and torrent (it supports BitTorrent protocol)
Storage Tiers
availability = 99,99%, durability = 99,999999999%
data survives even if 2 concurrent facilities will be down
S3 IA (Infrequently Accessed)
lower fee than S3
Reduced Redundancy Storage
availability = 99,99%, durability = 99,99%
cheaper than S3
Glacier
cheap but takes 3-5 hours to retrieve data
availability = 99,9%, durability = 99,999999999%
Versioning - once you enable it you cannot disable it (you can only suspend it)
Careful with big files, because all versions are stored, what can take a lot of your storage space
It may be used as a backup tool - once you deleted the object, you can steal restore it by deleting the "delete marker"
Cross-Region Replication replicates only NEW objects
objects are replicated with their permissions
If you delete an object then the "delete marker" is replicated, however if you delete the "delete marker" in your source bucket (restore the object ) then "delete marker" is not replicated (in your destination bucket the object is still deleted)
When you restore previous version (delete latest version) in your source bucket then it is NOT replicated (in destination bucket it is still in Latest version!)
versioning has to be enabled on both source and destination buckets
you cannot replicate to multiple buckets
Lifecycle management - allows you to setup rules to move/expire objects to Glacier or IA storage after some period (or some old versions of those objects)
actions on current versions
actions on previous versions
You will pay for storing objects at least for 90 days
when object expires it means there is removed the delete marker (however you're still able to restore the object)
not accessible from Singapore and South America regions
minimum object size is 128KB
what means that smaller objects will be charged as 128KB
edge location - place from where the content is served
origin - source of the content which will be cached
Distribution - name given to CDN which consists of collection of edge locations
the origin actually may be your non AWS server - CloudFront will still work
it isn't only for reading content. You can write to it also.
Objects are cached for TTL
you can cache a web content or streaming RTMP
when you PUT a new object to edge location then the edge location it will update your bucket
TTL is always in seconds
you set it up using "Default TTL"
using "Path Pattern" you can set the regular expression what should be cached, e.g. *.pdf
You can set up multiple restrictions
You can delete the distribution by firstly disabling it and then deleting
the distribution name is: [random string].cloudfront.net
WAF
you can whitelist/blacklist Geo-Restrictions
Pre-signed URLs and Pre-signed cookies
encryption
In-transit - SSL/TLS
at rest
Server Side Encryption
S3 managed keys = SSE-S3
AWS KMS - Key Management Service
logs information who is encrypting/decrypting what
uses another key to encrypt your encryption key
Server side encryption with customer provided keys - SSE-C
Client side encryption
4 types of Storage Gateway
2) File Gateway (NFS)
Volumes Gateway (iSCSI) - the block based storage (it's like virtual hard drive - you can install on it applications). Data is stored on volumes as Amazon EBS snapshots (limited in size 1 GB-16 TB).
3) Stored Volumes
4) Cached Volumes
1) Gateway Virtual Tape Library (VTL)
you're doing a full copy locally and then it is asynchronously backed up (do a EBS snapshot and store it in S3).
allows you storing your tape data to virtual cartridges
nothing is stored locally
entire data set is stored in S3 and the most frequently accessed data is cached locally
3 main types
(normal) Snowball
Snowball Edge
Snomobile
80 TB disk space
100 TB disk space
gives you also the compute capabilities. For example, the airplane engineer takes the snowball edge's box on the board and it's mounted as a disk. During the flight data about engines is collected and then send to Amzaon data centre. In the result you have in your cloud not only the data, but also the Lambda function.
Looks the same as normal snowball
up to 100 PB
before snowball there was an import export service where users were sending their own disks, but it was hard to manage for AWS
requires to install a client software on your machine
you have to download the manifest file and provide the access code from your AWS console to run the Snowball
To speed up the transfer of uploading you may use Transfer Acceleration service
it uses the CloudFront Edge Network
to upload you're using dedicated URL, e.g. rzepsky.s3-accelerate.amazonaws.com
you may use pre-signed URLs in the following scenario. You have a website with photos (stored on s3) and you don't want to share directly the photos but rather you want people to visit your website and see the pictures there (e.g. because of displaying ads to people). With normal s3 storage other websites can directly link to photos without reaching your website. If you use pre-signed urls then it is impossible to directly refer to photo. People can only see it on your website.
EC2 price options
on Demand
pay fixed rate by hour (or on seconds for Linux)
Reserved
capacity reservation. Requires signing a contract for 1-3 years
Spot
you need to set a bid price (a maximum price that you can spend for an hour/second of using EC2). If there is high demand for EC2 (a lot of people is buying it at the moment) then the price of it is going up. If this dynamic price is above your bind your instances will be stopped or terminated.
Dedicated hosts
It may be helpful if the licensing agreement requires it (e.g. an Oracle db) or for the government
can be bought as on Demand or as Reserved
EBS - virtual storage disk which you attach to your EC2 instance. It is a block storage so it allows you to install components on it (just like on your PC's HDD)
EBS types
GP2 - General Purpose SSD. Balanced price and performance.
IO1 - Provisioned IOPS SSD
3 IOPS per GB up to 10000 IOPS
more than 10000 IOPS
ST1 - Throughput Optimized HDD
for large amount sequential data (data warehouse/log processing/Big Data
SC1 - Cold HDD
Lowest cost storage for infrequently accessed workloads (e.g. as a file server,
Magnetic standard
Lowest price per GB
If you terminate the instance then you're going to pay for this hour. If the AWS terminates your instance then it is for free
You cannot mount 1 EBS to 2 EC2. Use EFS instead
Configuring
For each availability zone there is a separated subnet (1 subnet = 1 availability zone).
Only SSD and Magnetic disks can be bootable. The HDD ones CANNOT be root disks (but can be mounted additionally)
by default the EC2 and its disks are deleted when terminated (but you can change this behaviour)
remember about the tagging - thanks to tags you're able to track particular services which generate the costs.
a virtual machine in the cloud
RAID = Redundant Array of Independent Disks
RAID 0 - striped, no redundancy
RAID 1 - Mirrored
RAID 5 - at least 3 disks, good for reads and bad for writes; not recommended by AWS
RAID 10 - combination of RAID 0 and 1; good performance and redundancy
from multiple EBS volumes you can create a RAID (the EBSes can be of different types), e.g. from 4 different volumes create one striped partition D://
Taking a snapshot of RAID
normally taking a snapshot excludes application and OS cached data, however in RAID it can be a problem due to interdependencies of the array
To take a snapshot of RAID array you have to stop app from writing and flush all caches to the disk
You can do this using one of the following method: freeze filesystem, unmount the RAID array, shut down associated EC2 instance
Create the encrypted snapshot
1) Stop an instance and create a snapshot
2) Copy a snapshot to different region and enable encryption
2) Create an image (AMI) from this snapshot
Snapshots of encrypted volumes are encrypted automatically
You can share snapshots only when they are NOT encrypted
AMI - Amazon Machine Image
The Storage for the Root Device (Root Device Volume) can use either EBS (most common) or instance store (you CANNOT stop it - only reboot or terminate)
Rebooting EBS or instance store backed AMI will NOT loose your data
Terminating deletes EBS or instance store volumes HOWEVER with EBS volumes you can tell AWS to keep the root device volume
Instance store - the root device for an instance launched from the AMI is an instance store volume created from a template stored in Amazon S3 (slower and uses ephemeral storage)
Amazon EBS - the root device for an instance launched from the AMI is an Amazon EBS volume created from an Amazon EBS snapshot (they are faster and uses persistent storage)
Load Balancers
2 types
Classic Load Balancer (works in layer 4)
Application load balancer (works in layer 7)
Network Load Balancer
Require to configure health checks - LB passes traffic only to instances which pass the healthcheck
Default EC2 metrics
CPU related,
Disk related,
Network related,
status related
CloudWatch Events, e.g. a rule to update DNS when event is triggered
Logs - require installing the Agent
Standard monitoring = 5 minutes
Detailed monitoring = 1 minute
Cloud watch is for monitoring (performance), while CloudTrail is for auditing (what people are doing on your resources)
CLI
for some regions the '--region' parameter has to be explicitly specified and for some not. It is better than to use always this parameter so it will always work
Metadata
accessible under the address: http://169.254.169.254/latest/meta-data/
Launch Configuration
in the auto scaling group you define the amount of instances and the subnets (each subnet is a separate availability zone) - the more subnets the bigger redundancy you have.
In advanced settings you can specify the load balancer which does a health check
Grace period is a length of time before Auto Scaling does a health check
allows you specify increase and decrease group sizes - e.g. when the CPU > 90% add an instance and when CPU < 40% remove one instance
Auto Scaling automatically re-provision instances, e.g. when you terminate one of your instance it will be automatically (after a while) bring back
removing Auto Scaling Group also removes the instances specified in Launch Configuration
EC2 Placement Group
It's a logical grouping of instances within a single Availability Zone. It is recommended for applications that need very low latency and very high network throughput (10Gb/s), e.g. clusters.
Placement Group cannot span across multiple Availability Zones.
only certain instances can be launched in a Placement Group.
You cannot merge Placement Groups
You cannot move existing instance into Placement Group
However you can create an AMI from existing instance and then launch a new instance from AMI into a Placement Group
The name of Placement Group must be unique
It elastically grow (can scale up to petabytes) and shrinks when you add/remove files
it supports NFSv4
you pay only for the storage you use
it's a block based storage
read after write consistency
EC2 instances have to be in the same security groups as the elastic file system
To mount it simply run the proper command on each instance.
it's a compute service. You upload a code and Lambda takes care of provisioning and managing the servers to run your code.
The code is run when an event is trigerred
the event can be for example new file in S3 or an HTTP request (send to API Gateway and then it triggers Lambda function)
each request deploys a new Lambda function
You have to grant permissions to role assigned to your function, e.g. Simple Microservice permissions (without permissions it will not work)
You cannot remove snapshot of an EBS volume which is used as a root device for AMI - you have to firstly deregistered AMI and then you can remove the root device
A records maps domain name to IP
CNAME records map 1 domain name to another, e.g. http://m.example.com to http://mobile.example.com
Alias is similar to CNAME but it allows for mapping the naked domains e.g. http://example.com, while CNAME not. ELB doesn't resolve to IP address (always to domain name) so Alias here are very helpful
Amazon charges you for CNAME record, but not for Alias requests
Routing Policy
Simple
Weighted
Latency
Failover
Geolocation
default one
good if you have single resource, e.g. 1 web server
e.g. 20% goes to server 1 and 80% to server 2
allows you to route traffic based on the lowest network latency
detects a failover and redirects traffic to secondary server
requires creation of health check
based on the geographic location (continents, countries or only in US states)
it's helpful when you want for example present one version of a website for all Europeans, and the other version for all Africans
By default you can have registered 50 domains, however if you need more you can have after contacting Amazon support
collection = table
document = row
key value pairs = field
When to use what?
OLTP (Online Transaction Processing
OLAP (Online Analytics Processing) - it pulls a large amount of records; used for data warehousing
E.g. pull up a row where order_id=159
E.g. you want to count net income of EMEA and US region, then you need to have sum of :sold products from EMEA and US, unit cost of each product, sales price of each product etc.
uses 2 open source caching engines
Memcached
Redis
DMS - Database Migration Service
e.g. Oracle licensing is very expensive so you can convert your Oracle db to free MySQL
for OLTP
supports:
SQL,
MySQL,
Oracle,
Postgresql,
Aurora,
MariaDB
for OLAP
Backups
Automated backups
Database snapshot
It takes a daily db snapshot and stores transaction logs through all day. This allows you to recover a db to any point in time within retention period (it can be from 1 to 35 days)
Enabled by default
backups are taken in a defined time window and during a backup you may experience latency
The backup is stored on automatically created S3 bucket. If your RDS db is 10 GB big then there is created for free S3 10 GB bucket
Unlike automated backups they are stored EVEN AFTER deleting RDS instance
Restored instance is always a NEW RDS instance with a new endpoint
Multi - AZ
It allows you to to have exact copy of production db in different Availability Zone. Both dbs are automatically synchronised so if one fails then another one comes live.
if there's a planned mainenance window then RDS will automatically switch and then come live again without administrator's action
Multi-AZ is only for Disaster Recovery not for improving performance (for this you need Read Replicas)
Read Replica
you have multiple dbs which are synchronised with each other improving performance (e.g. each instance of a web server is connected with an RDS instance)
The other possible purpose is to take a replica of your production for dev purposes
Supported only by MySQL, PosgreSQL, MariaDB
It requires enabled automatic backup
You can have up to 5 replicas of db
Read replicas of read replicas are possible but may cause latency problems , because of asynchronous algorithm of synchronization
Read Replicas CANNOT have Multi-AZ, HOWEVER you can have read replicas of Multi-AZ source dbs
Each replica have its own DNS endpoint
Read replicas can become normal dbs (with write permission)
You can have read replica in a 2nd region however ONLY for MySQL and MariaDB
to scale RDS (increase the size) you have to manually restore a snapshot with bigger size or to add a read replica. DynamoDB has a scale "push button" and the process of scaling is automatic
Allows automatic scaling up process without any down time (in opposite to RDS)
good for gaming, IoT, etc.
Stored on SSD
Automatically spreads across 3 geographically different data centres (user cannot specify which specific AZs should be used)
by default it uses Eventual Consistent Reads (consistency across all copies is reached within a 1 second)
Pricing
Strongly Consistent Reads
Write throughput (per 10 units); expensive for writes
Read throughput (per 50 units); cheap for reads
Storage costs (per GB)
click to edit
Configuration
single node = 160 GB
you can use multi-node
Leader node (receives queries)
compute nodes (store and data and compute queries)
can be up to 128 compute nodes
is fast
uses Columnar Data Storage - processing only columns which are involved a query
good for aggregated queries
Advanced Compression - Columnar Data Can be compressed more effectively than row based data (it stores same type of data)
Massively Parallel Processing - automatically distributes data across all nodes
Pricing
is based on Compute Node Hours = 1 unit per node per hour
charged only for compute nodes
you are charged for backup
you are charged for data transfer within VPC
Security
in transit SSL
at rest AES 256
by default Redshift takes care of key management (but you can change it to AWS KMS or manage keys through HSM)
Available only in 1 AZ
Aurora - MySQL compatible (but provides up to 5 times better performance)
2 copies of your data in each AZ, with minimum of 3 AZs = 6 copies of your data
self-healing - automatically scans disks for errors and repairs them
Replicas
Aurora replicas (up to 15) - provide automatic failover
MySQL replicas (up to 5) - NOT provide automatic failover
the maximum provisioned IOPS capacity on an Oracle and MySQL RDS instance (using provisioned IOPS) is 30,000 IOPS
the maximum size RDS volume you can have by default using Amazon RDS Provisioned IOPS storage with MySQL and Oracle database engines is 6 TB
you can have only one virtual gateway per VPC
VPC peering = connecting 2 VPCs via a direct link using private IP address (something like vlans - vpc behaves as they are in the same network)
There's NO transmitive peering - VPCs can communicate only with directly connected VPCs
Configuration
Security Groups are stateful, but ACLs are stateless
Subnets
1 subnet = 1 availability zone
first 4 IPs and the last from the subnet (5 in all) cannot be used
by default each newly created subnet is asociated with the main route table
by default newly created subnets has no auto-assign public IP - you have to turn it on manually if you want to access resources in this subnet from public
Security groups dont't span across VPCs (if you created a security group in VPC1 it will not be visible in ~VPC2)
Source/destination checks - that means that each instance has to be either a source or destination. This is set by default. However you have to change it if you want to use NAT
NAT
NAT Gateway - for IPv4
Egress Only Internet Access - for IPv6
You should have 1 gateway per availability zone
You can use EC2 instance as a NAT GW too.
If you are bottlenecking, increase the instance size
Network ACL
created by default (allows all in and out traffic). But if you create your own one it has a rule DENY everything.
you can associate a subnet to only ONE Network ACL
Remember to specify a 'Custom TCP' outbound rule with a port range (it is required for ephemeral ports - the ports open on client side, e.g. on client side you're using 1050 as ephemeral port and 80 as a destination port)
you can block addresses using ACLs (NOT via Security Groups)
previous generation - forget about it
Rare choice, only if you need static IP and ultra high performance
Successor of previous classic load balancer
Load balancer in VPC
requires at least 2 public subnets
VPC Flow Logs
capture logs about the traffic
those logs can be viewed in CloudWatch Logs
Can be created on 3 levels
VPC
Subnet
Network Interface Level
You can stream captured logs e.g. to Lambda and react for the traffic
Flow logs between VPCs are ONLY possible if both VPCs are under ONE account
you cannot tag a flow log
After creating a flow log you cannot change its configuration, e.g. you cannot assign a new IAM role
Not all traffic is monitored. Exceptions are: Amazon DNS traffic (however if you have your own DNS server then it is logged), traffic from Windows instance for an Amazon Windows license activation, traffic to and from 169.254.169.254, DHCP traffic, traffic to reserved IP for the default VPC router
VPC endpoints
2 types
Elastic Network Interface
Gateway
works as a NAT gateway
accessing any AWS Service through VPC endpoint within one region is done using private IPs
they're used for accessing other AWS services behind the gateway (like to the different department, but in the same enterprise network)
NAT gateway is not associated with security groups (while NAT instances are)
By default you can have 5 VPCs in one region and 200 subnets per VPC
SES = Simple Email Service
Sends and receives emails.
You can put auto-scaling to your SQS. Once x jobs is sitting in the queue, then there will be launched automatically a new ec2 instance to handle new jobs
SQS is a pull-based system
Keeping jobs in a queue (independently from other system components) resolves the issue that arises if the producer is producing work faster than the consumer can process it.
2 types of queues
standard (default)
FIFO queues
generally works as fifo but occasionally some messages can be delivered out of order (but the deliver is guaranteed)
guarantees the order and deliver
limited to 300 transactions per second
message size = 256 KB
messages can be kept in a queue from 1 minute to 14 days (default is 4 days)
Visibility Time Out - amount of time that the message is invisible in the queue (taken message from the queue becomes invisible - if processed then deleted if not then become visible again). Maximum VTO is 12 hours.
long polling doesn't returns a response until a new message appears in queue (in opposite to short polling)
Differences between SWF and SQS
Retention: SQS has a retention 14 days and SWF up to 1 year
Orientation: SQS has a message-oriented API while SWF has task-oriented API (e.g. human involving tasks)
Duplicates: SWF ensures that task is NEVER duplicated, while duplicates in SQS are possible
tracking: SWF tracks all tasks and events in application, while in SQS you have to implement your own application-level tracking
SWF Actors
Workflow starters - an app that can start a workflow
Deciders - control the flow ( if the task is finished or fails they decide what to do next
Activity workers - carry out the activity tasks
allows you to group multiple recipients using topics, e.g. you can group iOS and Android recipients and when you publish once to this topic SNS delivers appropriately formatted copies to each subscriber
Message in SNS are stored across multiple availability zones
SNS is push-based
API caching (caching response for a specified time-to-live (TTL)
If you face problems with origin policy -> enable CORS (Cross-Origin Resource Sharing) on API Gateway
streaming data - data send continuously from many data sources (e.g. geospatial data like in uber app, or gaming)
Kinesis Services
Kinesis Streams
Kinesis Firehose
Kinesis Analytics
Data is stored in Shards
Data are stored from 24 hours to 7 days
Producers puts data in Shards
Retention: data is stored from 24 hours (default) to 7 days
then they are taken by consumers
data is not stored but automatically processed (there's no consumers), e.g. by Lambda
click to edit
Domain in SWF is a collection of related workflows
if you want to serve images from CloudFront rather than from S3 via Apache on EC2, you have to change a setting in httpd.conf to AllowOverwrite All
you can copy whole folder either using 'cp --recursive' or 'sync'. Sync copies ONLY new files. If you want to also include removing files from the destination then use 'sync --delete'
Additional exam tips:
3 ways to implement elasticity
Proactive Cyclic Scaling - periodic scaling after fixed time interval (e.g. daily)
Proactive Event-based Scaling - scale when an event occurs (release of your product)
Auto-scaling
Amazon currently doesn't support increasing storage on SQL Server Db instance
Key words
streaming large data -> Kinesis
Business Intelligence -> Redshift
Big Data processing -> Elastic Map Reduce
OpsWork
Orchestration Service that uses Chef
Resource groups - allows to group resources using tags assigned to them
"chef", "recipes", "cookbook" -> OpsWork
AWS Organizations - account management service which allows you to consolidate multiple AWS accounts (by default up to 20; for more you have to contact Amazon) into an organization
available in 2 feature sets
Enable only Consolidated Billing
Enable All Features
Cloud Trail works per AWS account and is enabled per region. However you can consolidate logs using S3 bucket (it requires cross account access policy and enabling CloudTrail on each account)
allows you to get discounts (you use 600 GB instead of 2 accounts per 300 GB), better management
you can manage users by consolidating them into groups or individually assign policies.
you can connect VPCs owned by different accounts however only in the same region
uses Ethernet VLAN Trunking (802.1Q)
STS = Security Token Service
Using STS you can log in to AWS with your AD account/OpenID providers without IAM credentials.
terms
Federation - combining list of users in 1 domain (e.g. IAM users) with list of users from 2 domain (e.g. AD)
Identity Broker - a service to take an identity from point A and join it (federate it) to point B
Identity store - service like AD
Identities - users of a service, like AD account
ECS = Elastic Container Service
Manages Docker containers on a cluster of EC2 instances
can be used to deploy sophisticated applications on a microservices model
ECR = EC2 Container Registry
it's a managed AWS Docker registry service. It supports private Docker repositories
Docker container are run based on JSON Task definitions
Allows you to root access
AMI = Amazon Machine Image
2 types of virtualization
PV - Para Virtualization
HVM - Hardware Virtual Machine
to maximize IOPS performance the best strategy is to add multiple additional volumes with provisioned IOPS and then create a RAID0 stripe across those volumes
it is possible to transfer a reserved instance from one AZ to another
uses 1024KB (1MB) block size for its columnar storage
Used for analysing big data
You can shorten time for processing an EMR job by reducing the input split size in the MapReduce job configuration and then adjust the number of simultaneous mapper tasks so that more tasks can be processed at once
public subnet has at least one route that uses an internet gateway in it's route table.
There is a soft limit for 20 EC2 instances per region
EC2 instance in a public subnet is only publicly accessible if it has a public ip address or is behind an elastic load balancer.
when using multiple availability zones, you cannot use the secondary database as an independent read node
Once a VPC is set to Dedicated hosting, it is not possible to change the VPC or the instances to Default hosting. You must re-create the VPC.
​VPC peering does not support edge to edge routing.​
DynamoDB allows for the storage of large text and binary objects, but there is a limit of 400 KB.