2021- Sep-Oct Learning

Cloud datawarehousing for dummies by Snowflake

Introduction

challenges

Data loads

Delays of data capturing due to the load

Capacity issues of handling large data volumes

More data, more opportunities and more challenges

data warehouse solution

store and organise data in various formats

provide convenient access to it

Improve the speed to analyse data

  1. Getting up to speed on Cloud data warehouse

Getting data from transactional databases to separate space for easier analysis

Also this improves the stability of the transactional DBs

have to cater for various data sources

web applications

Mobile applications

IOT devices

Challenges to conventional data warehouses

data sources vary from structured to unstructured

traditional architectures have contentions between users and data integration activities

Load data as batches is a limitation compared to continuous data loading

scaling up conventional data warehouse to meet increasing data loads is a challenge and often painful

How technology and designs have evolved

Cloud technologies

low cost of storage

scalability

management could be outsourced to cloud vendors

MPP - Massively parallel processing

Columnr storage

instead of row wise data storing storing as columns

Solid state drives ( SSD)

accelerates the data storage, retrieval and analysing

  1. Introducing cloud data warehouse

Advantages

No up front costs

Options

Traditional data warehouse
hosted on cloud infrastructure

still significant operational work is required such as backup, performance tuning and configuration work

Traditional data warehouse hosted and managed in the
cloud by a third party as a managed servic

True SaaS data warehouse

Vendors provide complete solution including hardware, software and managed services

what a cloud data warehouse
could improve in operations

Customer experience

real time monitoring ability of user behaviour helps to enhance products or tailor made features to bet suit customer needs

QA

use early warning signs to address customer service issues or
product shortcomings

operational efficiency

ability to monitor business by analysing events to identify opportunities of cost reduction, boost margins and respond to market forces rapidly

Innovation

spot and capitalise on trends

  1. Why the modern data warehouse emerged
  1. changes in data sources, volume and variety
  1. Increase demand for data access and analytics
  1. Technology improvements that significantly increased the efficiency of data storage, access and analytics

Amount of data organisations have to deal with has exponentially grown in the recent past

Data in cloud - SaaS products and platforms generate a large amount of data - ex CRM systems, ERP systems on the cloud

demand for the SaaS products have grown

logical to manage these data through cloud based data warehouses instead of on prem stores

using machine generated data

data collected from IoT devices

need to cut down noise due to poor signal to noise ratios

most of these devices sit on cloud so logical to manage through CDWH

Ability to experiment with data explorations which involves large data sets but with lesser costs in CDWHs - better ROI

Data lake introduction

legacy data lakes received massive raw
data loads in different formats which made traditional DWH solutions cost prohibitive

due to the low cost cloud DWH give a reasonable solution to the above problem

data driven decision making has become a BAU

Elasticity to enable analytics

data explorations have major benefits but no one knows the capacity required in advance

adhoc data analysis could require dynamic elasticity

event driven analysis enables real time monitoring and only elasticity could help handling the spikes and lows in data volumes

rapid iteration over exhaustive planning

growing trend is building analytics in to business applications

Advancement in cloud technologies

near unlimited resources on demand, pay per usage and scalability

cost

no up front costs

no maintenance costs

can focus on analysing data

natural integration point as most of the
data comes from the applications sitting on cloud

Columnar storage

columnar improves data storage, retrieval and analysis

SSDs

unlike HDD SSDs stoare data on flash memory chips which accelerates data storage, retrieval and analysis

No SQL

ability to store unstructured data

JASON, AVRO and XML

  1. criteria for selecting a modern data warehouse

meets current and future needs

ability to select storage and compute
resources independently

in order to add compute it should not be a
must to add storage

stores all data in one place

should not build additional traditional structures as a means of
layer in front non structural data while they are in their native forms

above could cause performance degradations

ability to run performant optimised queries on both structured and unstructured data is crucial

supports existing skills, tools and expertise

still the main language rely on is SQL

even though a modern data warehouse will be
technical advanced to handle noSQL data it should keep
in mind these traditional skill sets and standards ( ex:SQL)
as far as supportability is concerned

should reduce costs

a conventional data warehouse could cost a large sum of money on license, hardware, software and services

modern data warehouse should meet these requirements at a much lesser cost

Provide data resiliency and recovery

should not degrade the system performance when ensuring resiliency, durability and availability of data

secure data at rest and in transit

confidentiality, security and integrity

modern DWH provides role based access control

have MFA

encryption of data and key management

ability to provide a penetrating testing on the cloud

standards

SOC 1/SOC 2 Type II and ISO/IEC 27001.

streamlines the data pipeline ( ETL)

slow data pipelines forces the analysts to wait for data to load

modern DWH should move data faster across the pipeline

optimise your time to value

easy deployment and mostly automated

  1. On Premise vs Cloud DWH