Please enable JavaScript.

Coggle requires JavaScript to display documents.

AWS Certified Data Analytics - Specialty (DAS-C01), Kinesis Enhanced Fan…

- - - - Stream Big
        data in your system
        
        Arhictecture notes
        
        Capacity Mode notes
        
        Provision
        
        On - demand
        
        Security notes
        
        Kinesis Producers
        
        Kinesis SDK
        
        Put - records notes
        
        Kinesis Producer Library (KPL)
        
        Synchronous or Asynchronous
        
        No compresion by default, end user needs to develop the feature
        
        Kinesis Agent
        
        Kinesis Consumers
        
        SDK -getRecords
        
        each shartd has 2 mb total agregate throughput
        
        returns up to 10 mb of data or up to 10,000 records
        
        if 5 consumers application sconsume from the same shard, means every consumer can poll once a second and receive less than 400 KB/s
        
        Kinesis client libray KCL
        
        JAVA but exists other languages too
        
        Leverages DynamoDB coordinations and checkpoint
        
        if you have expiredIteratorException => increase WCU
        
        Kinesis Connectgor Library
        
        connect to
        
        s3
        
        dinamodb
        
        redshift
        
        Opensearch
        
        lambda
        
        puedes mandarlo a s3, dynamo, redshift, etc.
        
        can use to trigger notifications or send email
        
        Shard
        
        1 MB to producers
        
        2 MB for consumer
        
        Operations
        
        adding shards
        
        the old shard is closed and will be deleted once the data is expired
        
        Merging shards - or decrease shard
        
        decrease the stream capacity
        
        old shards are closed and deleted based on data expiration
        
        Auto scaling
        
        it takes time. it has limitations
        
        the order for the exam :check:
        
        after create a new shard there is a parent shard that has maybe pakager of information, maybe, the consumer recibe in desorder shard, the consumer has a logic in order no recibe more shards if parent has shards withoud send, whe need to conserve the shard order
        
        Handinling dplicates for producers
        
        Due to netwok timeouts
        
        el producer no se da cuenta que no llego el paquete porque no se retorna ningun error entonces vuelve y lo envia al no tener respuesta y ek kinesis queda repetido
        
        for consumer
        
        fixes
        
        make your consumer app idempotent
        
        if the final destination can handle duplicates, its recommended to do it there
    - - near real time (60 seconds latency)
      - load into redshift/s3/opensearch/splunk
      - data transformations througt lambda
      - pay for the amount of data going through firehose
      - Firehose buffer sizing
        
        TIme: every 2 minutos its flushed
        
        size: 32 mb, if that buffer size is reached, it's flushed
    - - Streams
        
        Going to write custome code (producer/consumer)
        
        real time (entre 200 ms y 70 ms latency)
        
        Must manage scaling (shard spitting /merging)
        
        Data storage for 1 to 365 days, replay capability, multi consumers
        
        Use with Lambda to insert data in real-time to OpenSearch (for example)
      - Firehose
        
        Fully managed, send to S3, Splunk, Redshift, OpenSearch
        
        Serverless data transformations with Lambda
        
        Near real time (lowest buffer tiime is 1 minute)
        
        Automated Scaling
        
        No data storage
  - - - EC2 on-premises, oracle, postgres etc
      - azure
      - rds aurora
      - s3
      - document db
    - - on premise
      - opensearch
      - redshiift
      - any data base on aws
    - - SCT es para esquemas diferentes, osea si es postrest to posgres esto no es necesario
  - - - increase bandwithd
      - more consistent ntework experience
      - hybrid enviroments (onprem + cloud)
  - - - Snowcone
        
        Up to 24 TB, online and
        offline
      - Snowball Edge
        
        Up to petabytes,
        offline
      - Snowmobile
        
        Up to exabytes, offline
  - - - Can create custom configurations for your clusters
      - Default message size of 1MB
      - Possibilities of sending large messages (ex: 10MB) into Kafka after custom configuration
    - - TLS betewwn brokers, between clientes and brokers and you can disable this setting if you want or for performance propouses
      - Tiene mas de un metodo para autenticar aparte de IAM
- - - - Buckets
        
        Objects have a key
        
        S3 looks like a global service but buckets are created in a region
        
        Buckets are defined at the region level
        
        Buckets must have a globally unique name (across all regions all accounts)
        
        Amazon S3 allows people to store objects (files) in “buckets” (directories)
    - - Json policies
    - - CRR - Cross region replication
      - SRR - Same Region Replication
      - Important After active replication only new objects will be replicated. For older objects use Batch Replication
    - - Standard
        
        General Propose
        
        used for frequentrly access data
        
        big data analytics
        
        mobile and gaming apps
        
        content distribution
        
        Infrecuent Access IA
        
        disaster recovery
        
        backups
      - One Zone infrecuent access
        
        storing secondary backup copies of on-premise data or data you can create
      - Glacier
        
        Instant Retrieval
        
        milisenconds retrieval
        
        Flexible Retrival
        
        Expedited - 1 a 5 mintos
        
        Standard - 3 a 5 horas
        
        Bulk 5 a 12 horas
        
        Deep Archive
        
        Standard - 12 hr
        
        bulk 48 hr
      - Intelligent Tiering
        
        Moves automatically based on usage
    - - retrieve data using SQL
    - - server-side encryption SSE
        
        key is handled, managed and owned by AWS
        
        Server Side encryption
      - KMS
        
        handled and managed by AWS KMS
        
        Server side encryption
      - SSE-C
        
        Server side encryption
        
        keys are gully managed by de customer outside of AWS
        
        HTTPS must be used
      - Client side encryption
        
        use client libraries such as Amazon s3 Client-Side Encryption Library
      - Encryption in transit
        
        SSL/TLS
        
        HTTPS is recommended
    - - Create folders like "finance" or "sales"
      - simple bucket policy could be used
  - - - Fully managed, highly available with replication across multiple AZs
      - NoSQL database - not a relational database
      - Scales to massive workloads, distributed database
      - Millions of requests per seconds, trillions of row, 100s of TB of storage
      - Fast and consistent in performance (low latency on retrieval)
      - Integrated with IAM for security, authorization and administration
      - Enables event driven programming with DynamoDB Streams
      - Low cost and auto-scaling capabilities
      - Standard & Infrequent Access (IA) Table Class
    - - DynamoDB is made of Tables
        
        Each table has a Primary Key (must be decided at creation time)
        
        Each table can have an infinite number of items (= rows)
        
        Each item has attributes (can be added over time – can be null)
        
        Maximum size of an item is 400KB
        
        Data types supported are:
        
        Scalar Types – String, Number, Binary, Boolean, Null
        
        Document Types – List, Map
        
        Set Types – String Set, Number Set, Binary Set
    - - you have some data that needs to be very hot, that needs to be ingested at scale within a database,
    - - Mobile apps
        
        Gaming
        
        Digital ad serving
        
        Live voting
        
        Audience interaction for live events
        
        Sensor networks
        
        Log ingestion
        
        Access control for web-based content
        
        Metadata storage for Amazon S3 objects
        
        E-commerce shopping carts
    - - Prewritten application tied to a traditional relational database: use RDS instead
        
        Joins or complex transactions
        
        Binary Large Object (BLOB) data: store data in S3 & metadata in DynamoDB
        
        Large data with low I/O rate: use S3 instead
    - - Provisioned Mode (default)
        
        You specify the number of reads/writes per second
        
        You need to plan capacity beforehand
        
        Pay for provisioned
        
        read & write capacity units
        
        Read Capacity Units (RCU)
        
        Examples view ppt
        
        Strongly Consistent Read vs. Eventually Consistent Read
        
        Eventually Consistent Read (default)
        
        If we read just after a write, it’s possible we’ll get some stale data because of replication
        
        Strongly Consistent Read
        
        If we read just after a write, we will get the correct data
        
        Set “ConsistentRead” parameter to True in API calls (GetItem, BatchGetItem, Query, Scan)
        
        Consumes twice the RCU
        
        Formula :
        •OneRead Capacity Unit (RCU)represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size
        •If the items are larger than 4 KB, more RCUs are consumed
        
        Si es strongly se deja igual
        SI es eventually se divide entre 2
        Los KB deben de ser multiplos de 4 sino da menos se sube al mas cercano, se redondea.
        Formula: ItemsPerSecond * ItemSizeInKB. Remember round kb to upside
        
        Write Capacity Units (WCU)
        
        •OneWrite Capacity Unit (WCU)represents one write per second for an item up to 1 KB in size
        •If the items are larger than 1 KB, more WCUs are consumed
        
        Examples
        
        View ppt
        
        Formula: ItemsPerSecond * ItemSizeInKB. Remember round kb to upside
      - On-Demand Mode
        
        Read/writes automatically scale up/down with your workloads
        
        No capacity planning needed
        
        Pay for what you use, more expensive ($$$)
    - - Reasons
        
        • Hot Keys – one partition key is being read too many times (e.g.,
        
        popular item)
        
        • Hot Partitions
        
        • Very large items, remember RCU and WCU depends on size of items
      - Solutions
        
        Exponential backoff when exception is encountered (already in SDK)
        
        • Distribute partition keys as much as possible
        
        • If RCU issue, we can use DynamoDB Accelerator (DAX)
    - - Global
        
        SI falla el query falla la tabla.
        
        Este lo puedes hacer despues de creada la tabla
        
        puedes crear un nuevo quiery con los atributos que necesites
      - Local
        
        se crea inmediatamente se crea la tabla y no se puede modificar
        
        es como agregar un atributo para busqueda solamente
    - - Allows you to select, insert, update, and
        delete data in DynamoDB using SQL
    - - Catch the most popular items or querys in dynamoDB
      - Monta una cache con clusters
      - Diferencia con Amazon ElastiCache es que dax es para queries y objects. Pero si luego de eso quieres hacer calculos de computo, hacer sort y demas lo puedes combinar con Elasticache en este caso es mejor opcion porque si la consulta o el scan es frecuente hay mucho gasta de computo. Si es Agregacon utilizat Elasticache
    - - • react to changes in real-time (welcome email to users)
        
        • Analytics
        
        • Insert into derivative tables
        
        • Insert into OpenSearch Service
        
        • Implement cross-region replication
      - Using sharsd como kinesis data stream
        
        no hay que hacer provision de shards
    - - Automatically delete items after an expiry
        timestamp
    - - Es como un RDS pero cache, redis tambien es cache. Memcached tambien por cache
      - good performance y para casos de uso de cache
- - - - Real-time file processing • Real-time stream processing • ETL • Cron replacement • Process AWS events
    - - Long
        -running applications
      - • Dynamic websites
      - Stateful applications
  - - - • S3 “data lakes” • RDS • Redshift • DynamoDB • Most other SQL databases
    - - Trigger-driven, on a schedule, or on
        
        demand
        
        • Fully managed
    - - scan data in S3 and create schema
    - - • Glue crawler will extract partitions based on how your S3 data is organized
    - - Hive lets you run SQL-like queriesfrom EMR
        
        The Glue Data Catalog can serve asa Hive “metastore”
        
        You can also import a Hivemetastore into Glue
    - - Transforma data, clean data, Enrich Data
        
        Automatic code generation
        
        • Scala or Python
        
        • Encryption
        
        • Server-side (at rest)
        
        • SSL (in transit)
        
        • Can be event-driven
        
        • Can provision additional “DPU’s” (data processing units) to increase performance of underlying Spark jobs
        
        • Enabling job metrics can help you understand the maximum capacity inDPU’s you need
        
        • Errors reported to CloudWatch
        
        • Could tie into SNS for notification
        
        Glue scheduler
        
        Glue trigger events
      - Deal with Ambiguities: ReolveChoice
        
        make_cols
        
        creates a new column for arch type: por ejemplo si el campo se llama igual y tiene dos tipos diferentes en cada fuente
        
        cast
        
        forzar a un tipo especifico
        
        make_ struct
        
        crea una estructura con los dos tipos de datos algo asi: "myList": [ { "price": 100.00 }, { "price": "$100.00" } ]
        
        project
        
        se puede proyectar todo a string entonces forza a que todo sea string
      - Modifying the data catalog
        
        puedes correr el script otra vez y agregar nuevas particiones, updating table schema, creatng new tables
        
        Restrictions: solo para S3,json, csv, avro, parquet
      - Running glue jobs
        
        job bookmarks : para correrlo solo para new rows y no repetir
        
        TIme - bases schedules
        
        CloudWatch Events
      - Cost Model
        
        Billed by the second for crawlerand ETL jobs
        
        First million objects stored andaccesses are free for the GlueData Catalog
        
        Development endpoints fordeveloping ETL code charged by the minute
      - para hacer transformaciones visuales
        esta glue studio que es normal y otro que se llama databrew que es mas simple
    - - • “Makes it easy to set up a secure data lake
        
        in days”
        
        • Loading data & monitoring data flows • Setting up partitions • Encryption & managing keys • Defining transformation jobs & monitoring
        
        them
        
        • Access control • Auditing • Built on top of Glue
      - Cómo funciona
        
        Toma datos de S3, RDS, NoSql etc, se ejecuta el blue pint de Lakeformation. Eso hace transforamcones y de todo. Al fnal envia esa info a athena, redshift o EMR
      - Governed Tables
        
        support ACID transaccions across multiple tables
      - Permissions: puedes usar permisos granulares en las tablas para dar accesos y persmisos como insertar, borrar, etc
  - - - Master note: manage the cluster
      - Core node: hosts HDFS data and runs tanks
      - Task node: Run tasks, does not host data, no risk of data loss when removing good use for spot instances
    - - Transient clusters
        
        terminate once all steps are complete
        
        • Loading data, processing, storing – then shut down
        • Saves money
      - Long-Running Clusters
        
        must be manually terminated
        
        • Basically a data warehouse with periodic processing on large datasets
        
        • Can spin up task nodes using Spot instances for temporary capacity
        
        • Can use reserved instances on long-running clusters to save $
        
        • Termination protection on by default, auto-termination off
    - - HDFS
        
        This is ephemeral, data is lost when cluster is termanted!
      - EMRFS: Access S3 as if it were HDFS
        
        Allows persistent storage after cluster termination
      - Local FIle system
        
        Temporary data (buffers, caches, etc)
      - EBS for HDFS
        
        Deleted when cluster is terminated
    - - MapReduce
        
        Framework for distributed data processing
        
        Maps data to key/value pairs
        
        Reduces intermediate results to final output
        
        Largely supplanted by Spark these days
        
        Se usa para trabajos de big data, usa tareas en paralelo, distribuidas, para filtrar y hacer operaciones con grandes volumenes de información
      - HDFS
        
        Hadoop Distributed File System
        
        Distributes data blocks across cluster in a
        
        redundant manner
        
        Ephemeral in EMR; data lost on termination
    - - Manually, create app, start app, stop app y delete app. Si se olvida hacer esto igual sigue corriendo y cobra plata
    - - Fully Managed
      - No provision clusters
    - - On top on MapReduce, para escribir codigo a mano en operaciones que tardan largo tiempo
    - - Non-relational db
      - In-memory
      - hive integration
    - - Conectar con muchos data sources y database, hacer quies interatiivos at petabyte scale
      - query toll
    - - Puedes ejecutr SQL directamente contra SparkSQL y visualizar en charts y graphs
      - es como un notebook para bigdata
    - - Nubtooks que se guardan en S3 es similar a zeppelin cpero conmas integracion
    - - Hue
        
        Hadoop user exprience, es un front end grafico
      - Splunk
        
        es una tool operacional para ver EMR and S3 data using your EMR Hadoop Cluster
      - Flume
        
        Another way to stream data in your cluster
      - MXnet
        
        Like Tensofrflow, a library for building and accelerating nural networks
        
        INcluded in EMR
      - S3DistCP
        
        Tool for copying large amount of data from S3 into HDFS y desde HDFS a S3
      - Mas tools
        
        • Ganglia (monitoring)
        
        • Mahout (machine learning)
        
        • Accumulo (another NoSQL database)
        
        • Sqoop (relational database connector)
        
        • HCatalog (table and storage management for Hive metastore)
        
        • Kinesis Connector (directly access Kinesis streams in your scripts)
        
        • Tachyon (accelerator for Spark)
        
        • Derby (open-source relational DB in Java)
        
        • Ranger (data security manager for Hadoop)
  - - - Distributed processing framework for big data
      - • In-memory caching, optimized query execution
      - • Supports Java, Scala, Python, and R
      - • Supports code reuse across
      - • Batch processing
      - • Interactive Queries: Spark SQL
      - • Real-time Analytics
      - • Machine Learning : MLLib
      - • Graph Processing
      - • Spark Streaming : Integrated with Kinesis, Kafka, on EMR
      - • Spark is NOT meant for OLTP: No es por transacciones es para procesar volumen de datos en seguntos o minutos
    - - • Spark apps are run as independent processes on a cluster
        
        • The SparkContext (driver program) coordinates them
        
        • SparkContext works through a Cluster Manager
        
        • Executors run computations and store data
        
        • SparkContext sends application code and tasks to executors
    - - Spark Core
        
        Memory management, fault recovery, scheduling, distribute & monitor jobs, interact with storage Scala, Python, Java, R
      - Spartk Streaming
        
        Real-time streaming analytics Structured streaming Twitter, Kafka, Flume, HDFS, ZeroMQ
      - Spark SQL
        
        Up to 100x faster than MapReduce JDBC, ODBC, JSON, HDFS, ORC, Parquet, HiveQL
      - MLLib
        
        Classification, regression, clustering, collaborative filtering, pattern mining Read from HDFS, HBase
      - GraphX
        
        Graph Processing ETL, analysis, iterative graph computation No longer widely used
    - - Kinesis
      - Dynamo
      - Athena
  - - - LOs mteastore se pueden guardar en lo de aws, athena, glue, rds, etc
  - - - • Destinations include S3, RDS,
      - DynamoDB, Redshift and EMR
      - • Manages task dependencies • Retries and notifies on failures • Cross-region pipelines • Precondition checks • Data sources may be on-premises • Highly available
    - - EMR : como iniciar el cluster pera realizar operacionesi
      - Hive queries
      - Copy data between data soruces
      - SQL
      - Scripts
- - - - Flink is a framework for processing data streams
  - - - Streaming ETL
      - Continuous metric generation
      - Responsive Analytics
    - - • SQL function used for anomaly detection on numeric columns in a stream
        
        • They’re especially proud of this because they published a paper on it
        
        • It’s a novel way to identify outliers in a data set so you can handle them however you need to
        
        • Example: detect anomalous subway ridership during the NYC marathon
  - - - • A fork of Elasticsearch and Kibana
      - • A search engine • An analysis tool • A visualization tool (Dashboards = Kibana)
      - • A data pipeline • Kinesis replaces Beats & LogStash
      - • Horizontally scalable
    - - • Full-text search • Log analytics • Application monitoring • Security analytics • Clickstream analytics
    - - Documents tambien pueden ser json
      - types
        
        por ejemplo documento, un log entry, un articulo, etc
      - Indices
        
        an index is split into shards
        
        documents are hashed to a particular shard
        
        each shard may be on a different node in a cluster
        
        every shard is a self-contained Lucene index of its own
    - - • Fully-managed (but not serverless) - There is a separate serverless option now
      - • Scale up or down without downtime - But this isn’t automatic
      - • Pay for what you use - Instance-hours, storage, data transfer
      - • Network isolation
      - • AWS integration - S3 buckets (via Lambda to Kinesis) - Kinesis Data Streams - DynamoDB Streams - CloudWatch / CloudTrail - Zone awareness
    - - • OLTP • No transactions • RDS or DynamoDB is better : No para transacciones
      - • Ad-hoc data querying • Athena is better : mejor usar athena para queries
      - Remember Opensearch is primarily for search & analytics
    - - Hot
        
        Es el standard
      - UltraWarm
        
        • UltraWarm (warm) storage uses S3 + caching
        
        Best for indices with few writes (like log data / immutable data)
        
        • Slower performance but much lower cost
        
        • Must have a dedicated master node
      - Cold storage
        
        • Also uses S3
        
        • Even cheaper
        
        • For “periodic research or forensic analysis on older data”
        
        • Must have dedicated master and have UltraWarm enabled too.
        
        • Not compatible with T2 or T3 instance types on data nodes
        
        • If using fine-grained access control, must map users to cold_manager role in OpenSearch Dashboards
    - - Kinesis, DynamoDB, Logstash / Beats, and Elasticsearch's native API's offer means to import data into Amazon ES.
  - - - • Interactive query service for S3 (SQL)
      - Serverless!
      - • Supports many data formats • CSV, TSV (human readable) • JSON (human readable) • ORC (columnar, splittable) • Parquet (columnar, splittable) • Avro (splittable) • Snappy, Zlib, LZO, Gzip compression
      - Unstructured, semi-structured, or structured
    - - Se integra con el catalogo, schema etc
    - - organize users, teams, apps, workloads
      - Integrates with IAM, CloudWatch, SNS
      - Each grup can have its own
        
        query history
        
        data limit
        
        IAM policies
        
        Encryption settings
    - - visualization
        
        quicksight
      - etl
        
        glue
    - - Powered by Apache Iceberg
        • Just add ‘table_type’ = ‘ICEBERG’ in your CREATE TABLE command
      - Remember governed tables in Lake Formation? This
        is another way of getting ACID features in Athena.
  - - - Design for OLAP, not OLTP
      - SUPER FAST, SCALABLE Y SUPER SHEAP
      - SQL,ODBC,JDBC interfaces
    - - • Accelerate analytics workloads
      - • Unified data warehouse & data lake
      - • Data warehouse modernization
      - • Analyze global sales data
      - • Store historical stock trade data
      - • Analyze ad impressions & clicks
      - • Aggregate gaming data
      - • Analyze social trends
    - - leader node
        
        contiene: compute nodes
    - - query exabytes of unstructured data in S3 without loading
      - wide variety of data formats
    - - • Massively Parallel Processing (MPP) • Columnar Data Storage • Column Compression
    - - • Replication within cluster • Backup to S3
      - Automated snapshots
    - - Vertical and horizontal scaling
        on demand
    - - Auto
        
        Based on size of data, redshift toma la desicion por ti
        
        Even
        
        round robin fashion, es buena cuando no hay ninguna relacion con otras tablas o join.
        
        Key
        
        se distribuyen segun el valor en la columna, es para hacer queries basados en una columna, las de igual valir estarian en la misma parte y mas rapido el query
        
        All
        
        la tabla se pone en un nodo para fortalecer los joins
    - - • COPY command : Parallelized; efficient • From S3, EMR, DynamoDB, remote hosts • S3 requires a manifest file and IAM role
        
        es el mas eficiente
        
        usarlo para large amouns of data form outside of redshift
        
        puede decriptar informacion cargada desde s3
        
        • Gzip, lzop, and bzip2 compression supported to speed it up further
        
        • Automatic compression option • Analyzes data being loaded and figures out optimal compression
        scheme for storing it
        
        • Special case: narrow tables (lots of rows, few columns) • Load with a single COPY transaction if possible • Otherwise hidden metadata columns consume too much space
      - • UNLOAD command • Unload from a table into files in S3
      - • Enhanced VPC routing
        
        copia el trafico que entra en la vpc, hay que tenerla bien configurada para que no guarde trafico de internet
      - • Auto-copy from Amazon S3
        
        monitorea el bucket de S3, cuando tiene una info la sincroniza
      - • Amazon Aurora zero-ETL integration • Auto replication from Aurora -> Redshift
      - • Redshift Streaming Ingestion • From Kinesis Data Streams or MSK
    - - • Let’s say you have a KMS-encrypted Redshift cluster and a snapshot of it
      - • You want to copy that snapshot to another region for backup
      - • In the destination AWS region: 1. Create a KMS key if you don’t have one already 2. Specify a unique name for your snapshot copy grant 3. Specify the KMS key ID for which you’re creating the copy grant
      - • In the source AWS region: 1 Enable copying of snapshots to the copy grant you just created
    - - Good way to copy and sync data between PostgreSQL and Redshift
    - - • S3
      - • DynamoDB
      - • EMR / EC2
      - • Data Pipeline
      - • Database Migration Service
    - - Se usa para priorizar queris, pequeños y rapidos vs langos y lentos. Crea una cola de queris y se administra via consola, cli o api
    - - • Automatically adds cluster capacity to handle increase in concurrent read queries
        
        • Support virtually unlimited concurrent users & queries
        
        • WLM queues manage which queries are sent to the concurrency scaling cluster
    - - Creates up to 8 queues
      - Default 5 queues with even memory allocation
      - • Large queries (ie big hash joins) -> concurrency lowered
      - • Small queries (ie inserts, scans, aggregations) -> concurrency raised
      - • Configuring query queues
        
        • Priority
        
        • Concurrency scaling mode
        
        • User groups
        
        • Query groups
        
        • Query monitoring rules
    - - • Prioritize short-running queries over longer-running ones
      - • Short queries run in a dedicated space, won’t wait in queue behind long queries
      - • Can be used in place of WLM queues for short queries
      - • Works with: • CREATE TABLE AS (CTAS) • Read-only queries (SELECT statements)
      - • Uses machine learning to predict a query’s execution time
      - • Can configure how many seconds is “short”
    - - Recovers space from deleted rows
    - - • Small data sets • Use RDS instead
      - • OLTP • Use RDS or DynamoDB instead
      - • Unstructured data • ETL first with EMR etc.
      - • BLOB data • Store references to large binary files in S3, not the files themselves.
    - - Elastic resize
        
        cluster is down for a few minutes
        
        Quickly add or remove nodes of same type
      - Classic resize
        
        Change node type and/or numbre of nodes
        
        Cluster is red-only for hours to days
      - Snapshot, restore, resize
        
        Used to keep cluster available during a classic resize
        
        Copy cluster, resize new cluster
    - - Advanced query accelerator
      - esta cercad e los servidores de s3, y es mucho mas rapido
    - - es parecido a las vistas pero tienen precomputo, osea poner un query de una base de datos que grande y lo pones en esta vista para que sea mas rapido. tambien sirve para querys complejos
    - - Use custom functions in AWS Lambda inside SQL queries
    - - query and analize across databases, warehouses and lakes
      - sirve para aura, rds, mysql
      - incorpora data vida in RDS into your Redshift queries
- - - - Data sets are imported into SPICE • Super-fast, Parallel, In-memory Calculation Engine • Uses columnar storage, in-memory, machine code generation • Accelerates interactive queries on large datasets
    - - • By default Quicksight can only access data stored IN THE SAME REGION as the one Quicksight is running within
      - • So if Quicksight is running in one region, and Redshift in another, that’s a problem
      - • A VPC configured to work across AWS regions won’t work!
      - • Solution: create a new security group with an inbound rule authorizing access from the IP range of QuickSight servers in that region
      - Otra solucion: create private subnet in a vpc y luego tener un peering de VPC, para esto se necesita la enterprise edition
      - Otra solucion es tener un trasit gateway para conectar subnets, or use AWS private link or use VPC Sharing
    - - Es como un chat gpt data como interfaz. ML donde puedes hacer preguntas basadoa en procesamiento del lenguaje natural
      - the datasets and therir fields must be NPL-friendly
- - - - • Data is encrypted before sending and decrypted after receiving
      - • TLS certificates help with encryption (HTTPS)
      - • Encryption in flight ensures no MITM (man in the middle attack)can happen
  - - - • Object is encrypted server side
      - • AES-256 encryption type
      - • Must set header: “x-amz-server-side-encryption": "AES256"
    - - • KMS Advantages: user control + audit trail
      - • Object is encrypted server side
      - • Must set header: “x-amz-server-side-encryption": ”aws:kms"
    - - • SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
      - • Amazon S3 does not store the encryption key you provide
      - • HTTPS must be used
      - • Encryption key must provided in HTTP headers, for every HTTP request made
    - - • Client library such as the Amazon S3 Encryption Client
      - • Clients must encrypt data themselves before sending to S3
      - • Clients must decrypt data themselves when retrieving from S3
      - • Customer fully manages the keys and encryption cycle
    - - Esta soportada, la puedes usar y es recomendada, tiene un https endpoint
  - - - Symmetric (AES-256 keys)
        
        • Single encryption key that is used to Encrypt and Decrypt
        
        • You never get access to the KMS Key unencrypted (must call KMS API to use)
      - Asymmetric (RSA & ECC key pairs)
        
        • Public (Encrypt) and Private Key (Decrypt) pair
        
        • Used for Encrypt/Decrypt, or Sign/Verify operations
        
        • The public key is downloadable, but you can’t access the Private Keyunencrypted
        
        • Use case: encryption outside of AWS by users who can’t call the KMS API
    - - KMS no es multiregion, entonces sacas un snapshot en la region 1 luego lo copias en la region dos y ahi ya la kms key es otra que vive en esa otra region
  - - - Se usa mucho para feredacion
  - - - • Federation lets users outside of AWS to assume
      - temporary role for accessing AWS resources.
      - • These users assume identity provided access role.