Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data Concept and Tools Week 7 (Critical Success Factors for Big Data…
Big Data Concept and Tools
Week 7
Big Data = Massive volumes of Data
Characteristics of Big Data
Variety (Complexity)
Velocity (Speed)
Volume (Scale)
Other items that define Big Data
Veracity
: accuracy, quality, truthfulness,
trustworthiness
Variability
: data flows can be inconsistent with periodic
peaks
Value
: provides business value
Real-Time/Fast Data
Fundamentals of Big Data Analytics
Big Data by itself, regardless of the size, type, or speed,
is worthless
Big Data + “big” analytics = value
Challenge of effectively and efficiently capturing, storing, and
analyzing Big Data
New technologies needed
Limitations of Data Warehouse/Relational Database
Schema (Fixed)
Scalability
Unable to handle huge amounts of new/contemporary data sources
Speed
Unable to handle speed at which big data is arriving
Others
Unable to handle sophisticated processing such as machine learning
Unable to perform queries on big data efficiently
Challenges of Big Data Analytics
Processing capabilities
The ability to process the data quickly, as it is captured (i.e.,stream analytics)
Data Governance
Security, privacy, ownership, quality issues
Data integration
The ability to combine data quickly and at reasonable cost
Skill availability
: shortage of data scientists
Data Volume
The ability to capture, store, and process the huge volume of data in a timely manner
Solution cost
: Return on Investment
Critical Success Factors for
Big Data Analytics
A fact based decision making culture
A strong data infrastructure
Alignment between the business & IT strategy
The right analytics tool
Strong committed sponsorship
Personnel with advanced analytics skills
A clear business need
Popular Big Data Technologies
Hadoop
MapReduce
NoSQL
HIVE
PIG
Hadoop Cluster
Master Node
Name Node
keeps track of the files and directories
provides information on where in the cluster data is stored and if any of the nodes failed
Job Tracker
initiates and co-ordinates jobs or the processing of data and dispatches compute tasks to the Task Tracker
Slave Node
Data Node
A storage node where data is stored
Task Tracker
for processing data and are known as compute node
HADOOP
Hadoop consists of multiple products
Hadoop consists of multiple products
Hadoop is an ecosystem, not a single product
HDFS is a file system, not a DBMS
Hadoop and MapReduce are related but not the same
MapReduce provides control for analytics
Hadoop is about data diversity, not just data volume
Hadoop complements a DW; it’s rarely a replacement
Hadoop enables many types of analytics, not just Web
analytics
Technical Components
Hadoop Distributed File System (HDFS)
Name Node (primary facilitator)
Secondary Node (backup to Name Node)
Job Tracker
Slave Nodes (the grunts of any Hadoop cluster)
Made up of a no. of complementary sub-projects
NoSQL
Often works in conjunction with Hadoop
Serves discrete data stored among large volumes of multi-structured data to end-users and Big Data
applications
Big Data and Data Warehousing
Impact of Big Data on DW
Big Data and RDBMS do not go nicely together
Use Cases for Hadoop
Hadoop as the repository and refinery
Hadoop as the active archive
Use Cases for Data Warehousing
Data warehouse performance
Integrating data that provides business value
Interactive BI tools
Coexistence of Hadoop and DW
Use Hadoop for storing and archiving multistructured data
Use Hadoop for filtering, transforming, and/or
consolidating multi-structured data
Use Hadoop to analyze large volumes of multistructured data and publish the analytical results
Use a relational DBMS that provides MapReduce
capabilities as an investigative computing platform
Use a front-end query tool to access and analyze
data
Big Data and Stream Analytics
Analytic process of extracting actionable information from
continuously flowing data
Store-everything approach infeasible when the number
of data sources increases
Need for critical event processing - complex pattern
variations that need to be detected and acted on as soon as they happen.
YEO JUN WEN MAX