Please enable JavaScript.
Coggle requires JavaScript to display documents.
2021- Sep-Oct Learning - Coggle Diagram
2021- Sep-Oct Learning
Cloud datawarehousing for dummies by Snowflake
Introduction
challenges
Data loads
Delays of data capturing due to the load
Capacity issues of handling large data volumes
More data, more opportunities and more challenges
data warehouse solution
store and organise data in various formats
provide convenient access to it
Improve the speed to analyse data
Getting up to speed on Cloud data warehouse
Getting data from transactional databases to separate space for easier analysis
Also this improves the stability of the transactional DBs
have to cater for various data sources
web applications
Mobile applications
IOT devices
Challenges to conventional data warehouses
data sources vary from structured to unstructured
traditional architectures have contentions between users and data integration activities
Load data as batches is a limitation compared to continuous data loading
scaling up conventional data warehouse to meet increasing data loads is a challenge and often painful
How technology and designs have evolved
Cloud technologies
low cost of storage
scalability
management could be outsourced to cloud vendors
MPP - Massively parallel processing
Columnr storage
instead of row wise data storing storing as columns
Solid state drives ( SSD)
accelerates the data storage, retrieval and analysing
Introducing cloud data warehouse
Advantages
No up front costs
Options
Traditional data warehouse
hosted on cloud infrastructure
still significant operational work is required such as backup, performance tuning and configuration work
Traditional data warehouse hosted and managed in the
cloud by a third party as a managed servic
True SaaS data warehouse
Vendors provide complete solution including hardware, software and managed services
what a cloud data warehouse
could improve in operations
Customer experience
real time monitoring ability of user behaviour helps to enhance products or tailor made features to bet suit customer needs
QA
use early warning signs to address customer service issues or
product shortcomings
operational efficiency
ability to monitor business by analysing events to identify opportunities of cost reduction, boost margins and respond to market forces rapidly
Innovation
spot and capitalise on trends
Why the modern data warehouse emerged
changes in data sources, volume and variety
Amount of data organisations have to deal with has exponentially grown in the recent past
Data in cloud - SaaS products and platforms generate a large amount of data - ex CRM systems, ERP systems on the cloud
demand for the SaaS products have grown
logical to manage these data through cloud based data warehouses instead of on prem stores
using machine generated data
data collected from IoT devices
need to cut down noise due to poor signal to noise ratios
most of these devices sit on cloud so logical to manage through CDWH
Ability to experiment with data explorations which involves large data sets but with lesser costs in CDWHs - better ROI
Data lake introduction
legacy data lakes received massive raw
data loads in different formats which made traditional DWH solutions cost prohibitive
due to the low cost cloud DWH give a reasonable solution to the above problem
Increase demand for data access and analytics
data driven decision making has become a BAU
Elasticity to enable analytics
data explorations have major benefits but no one knows the capacity required in advance
adhoc data analysis could require dynamic elasticity
event driven analysis enables real time monitoring and only elasticity could help handling the spikes and lows in data volumes
rapid iteration over exhaustive planning
growing trend is building analytics in to business applications
Technology improvements that significantly increased the efficiency of data storage, access and analytics
Advancement in cloud technologies
near unlimited resources on demand, pay per usage and scalability
cost
no up front costs
no maintenance costs
can focus on analysing data
natural integration point as most of the
data comes from the applications sitting on cloud
Columnar storage
columnar improves data storage, retrieval and analysis
SSDs
unlike HDD SSDs stoare data on flash memory chips which accelerates data storage, retrieval and analysis
No SQL
ability to store unstructured data
JASON, AVRO and XML
criteria for selecting a modern data warehouse
meets current and future needs
ability to select storage and compute
resources independently
in order to add compute it should not be a
must to add storage
stores all data in one place
should not build additional traditional structures as a means of
layer in front non structural data while they are in their native forms
above could cause performance degradations
ability to run performant optimised queries on both structured and unstructured data is crucial
supports existing skills, tools and expertise
still the main language rely on is SQL
even though a modern data warehouse will be
technical advanced to handle noSQL data it should keep
in mind these traditional skill sets and standards ( ex:SQL)
as far as supportability is concerned
should reduce costs
a conventional data warehouse could cost a large sum of money on license, hardware, software and services
modern data warehouse should meet these requirements at a much lesser cost
Provide data resiliency and recovery
should not degrade the system performance when ensuring resiliency, durability and availability of data
secure data at rest and in transit
confidentiality, security and integrity
modern DWH provides role based access control
have MFA
encryption of data and key management
ability to provide a penetrating testing on the cloud
standards
SOC 1/SOC 2 Type II and ISO/IEC 27001.
streamlines the data pipeline ( ETL)
slow data pipelines forces the analysts to wait for data to load
modern DWH should move data faster across the pipeline
optimise your time to value
easy deployment and mostly automated
On Premise vs Cloud DWH