Please enable JavaScript.
Coggle requires JavaScript to display documents.
Redshift (Features (Redshift Spectrum (Open data format (Avro, Ion, JSON,…
Redshift
Features
Standard sql compatable
BI
ODBC connectors supported
JDBC connectors supported
PB scale independant of compute
Massively parallel Scale
Redshift
Spectrum
Run SQL queries against PB scale S3 data lakes
Open data format
Avro
Ion
JSON
PRC
Parquet
Scale compute based on data been retrived
No loading or ETL (do need to get data in s3)
Will minimise\optimize the S3 query
Meta data of s3 data held in redshift cluster
Multiples clusters can access same S£ data
supports Gzip and Snappy compression
Manually scalable to 100's of instances independent of storage
Release
tracks
Current
Trailing
Preview
selecting PREVIEW_FEATURES will preview features
AQUA is a hardware-accelerated cache for redshift
Self
management
Backups
Patching
Node health
Compute
Load balancing
Upto x10 performance increase
Concurrency Scaling will transparently
scale out in response to queries
Backup
Continous backup to S3
3 copies
of data
Cluster x 2
1 x copy on S£
Attemptes to
Can async repica SS to another region
Default 1 day autonmated backups
Can configure upto 35days of backups
Recovery of snapshot will be to a new cluster
Managed via console or ModifyCluster API
Turn off by setting retention to 0.
Performance
Columner based system only columns
needed are used in queries
Columner data has higher compression rates
MPP:Data and query load spread across nodes
Resizing
Elastic Cluster Resize
unavailable for 4-8 minutes after
Suitable for manual sicale out\in
Cconcurrency scaling cluster fully available
Applied immediately
Supports virtually unlimited concurrency and read queires
Load method
Insert into
S3
COPY
Copy Command
EMR
DynamoDB
SSH enabled host
RDS
Full ETL
AWS Glue
Datapipeline
High perf
Reliable
Resilent
JDBC\ODBC insert
Slower than an S3 load
Client
Security
Data in transit
SSL security
Data at reste
Aes-256
encrypts interim queries
Includes backups
key management
self manage via HSM
Redshift manage
AMS KMS
Spectrum supports SSE
Cant access data ware house cluster network
Avalability
AZ Failure
Will be offline until AZ is recovered
Multi region is not supported
Will reload data from S£ most frequent theory first.
Can have two dwh and manual sync
Drive failure
will resulting in slight query preformance
Will transparently use replica
will move data to a new node
Will replace node if requires
Node failure
Single node
Replication not supported on single node clusters
Recover from snapshot
Multi node
DWH will be unavailable for queries
and updates until new node is provisoned
Redshift wil replace the node
Load most freq query data first
Maintence
Scheduled period where cluster not available
perform maintence/[atching
Change window byRedshift console or api
Configuration
Multi node
Cluster
No control over leader
2 x Nodes
Store data
Perform Queries
1 x Leader
RX conncetions
Manges client connections
Config info
AZ
Nodes
RA3
Pay for compute and storage
Up to RA3.16XL
Upto8TB
Min of two nodes
DS
xl - 3 x HDD 2TB Mag Storage + 4 vpcu 31 GiB
8XL 24 x HDD 16TB mag storage + 36 Vcpu + 244 GiB
DC
l - 2 vcpu 15 iB + 160 GiB
8XL - 2.56 GiB Mag storage + 32 Vcpu + 244 GiB mem
Master name and password
Security groups
Backup Retention
System settings
Constraints
1-128 nodes
Billing
Charging
Compute nodes hours
Partial hours rounded up
Data transfer
no charge for intra-regions transfers
Interregion transfers charged under vpc
Backup storage
Automated snapshots
Backup snapshots
Not charged for backup storage up to size of DWH
Data scanned
(Red shift spectrum)
chrage for amout of storage scanned if data on S3