Please enable JavaScript.
Coggle requires JavaScript to display documents.
Redshift (Table design (Data types (typical types from sql db
new…
Redshift
Table design
Data types
- typical types from sql db
- new columns can be added
- existing columnns CANNOT be modified
-
Distribution strategy
- EVEN: default, data distributed evenly across slices
- KEY: rows s grouped by values in 1 column to increase JOIN performance
- ALL: full copy of entire table distributed to all nodes
Sort Keys
- compound: good for queries using subset of key columns in order
- interleaved: queries can use any subset of key columns in any order
-
Architecture
Cluster
Specification:
- dense compute: up to 326 TB fast SSD per cluster
dense storage: up to 2PB of magnetic discs per cluster
Nodes
Leader Node
- one in a cluster
- client app interacts only with leader node
Compute node
- one or more in a cluster
- transparent for client applications
- user data distributed across those nodes
Slices
- between 2 and 16 per node
- part of disc storage for node
- data distributed as evenly as possible
Data Operations
Loading
COPY
- for bulk data load
- supports many source files from S3
- requires running VACUUM afterwards, to reorganize data
- recommended to run ANALYZE afterwards to update table statistics
-
Querying
- sql SELECT supported
- for large clusters,Workload Management (WLM) can be used to prioritize queries
Snapshots
Automated
- kept for a retention period
- taken periodically
Manual
- can be shared across regions / accounts
- require manual delete
Core features
- petabyte-scale
- SQL supported
- ODBC/JDBC Supported