Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data Concepts and Tools/WEBM (Challenges regarding Big Data Analytics,…
Big Data Concepts and Tools/WEBM
BIG DATA
Exponential growth, availability and use of information, both structured and unstructured
BIG DATA = Transactions + Interactions+ Observations
Data Volume has increase exponentially
V's of Big Data
Velocity (Speed)
Rapidly generated data calls for faster processing
Late Decision -> Missing Opportunities
Online data analytics
Veracity
(Accuracy, Quality, Truthfulness/Trustworthiness)
Variability
Data Flows can be inconsistent with periodic peaks
Value
Provides Business Value
Variety (Complexity of Data)
Structured Data - Tables, Transactions, Spreadsheets
Semi-Structured Data - Emails, Logs, Documents
Unstructured Data - GPS,Multi-media etc.
Huge public domain data (To extract information, These data has to be linked)
Challenges
Effective and Efficient recording, storage and analysing of big data requires new technology to be developed
Limitations of DW/RDB
Schema (Fixed)
Scalability
Unable to handle immense data (TB/PB)
Speed
Unable to handle the velocity of data
Others
Unable to handle sophiscated processing e.g. ML and performing queries on big data efficiently
Challenges regarding Big Data Analytics
Processing Capabilities
Ability to swiftly process raw incoming data
Data Governance
Security, Privacy, Ownership, Quality Issues
Data Integration
Lack of cost saving method in combining data swiftly
Skill availability
Shortage of data scientist
Data Volume
Capability of capturing, storing and processing
the massive volume of data promptly
Solution Cost
ROI
Success factors of BDA
Clear Business Needs
Strong Committed Sponsorship
Alignment Between Business and IT Strategy
Fact Based Decision Making Culture
Strong Data Infrastructure
Right Analytics Tools
Personnel With Advanced Analytical Skills
Issues addressed by BDA
Process efficiency and cost reduction
Brand Management
Revenue Maximization, Cross/Up selling
Enhanced Customer Experience
Churn Identification, Custom Recruiting etc.
High Performance computing for Big Data
In-database analytics
Storing analytic procedure close the where data is kept
Grid computing and MPP (Massively Parallel Processing)
Using multiple machines and processors in parallel
In-memory analytics
Storing and processing finished data set in RAM
Appliances
Combining hardware,software and storage in a single unit for performance and scalability
Big data techs
Hadoop
MapReduce
NoSQL
HIVE
PIG
Hadoop
Engineered to store and analyse large amount of dist., semi and unstructured data
Together with MapReduce = Big Data Core Technology
How does it work
Two Components
Hadoop Distributed File system (HDFS)
Map Reduce
Map Reduce
Aims to achieve high performance with simple computation
Remarkable at processing and analysing massive volume of multi-structured data in a timely manner
How does it work
1 more item...
Distributed the processing of the massive multi-structured data files across large cluster of ordinary machines/processors
Used in index web for search, ML, text analysis etc.
Semi-Centralised: Data is de-fragmented into parts and loaded into file system (cluster) resulting in multiple nodes (system)
Each bits of data undergo several duplication and is loaded into the file system for replication and fail-safe processing
Hadoop Cluster
2 nodes
Slave : Data Node and Task Tracker
2 more items...
Master : Name node and Job Tracker
2 more items...
Jobs are distributed to the clients, once the job is completed, the results are collated and aggregated using MapReduce
The clusters run on inexpensive commodity hardware capable of scaling
Technical Components
HDFS
Name Node (Primary Facilitator)
Secondary Node (Backup to Name Node)
Job Tracker
Slaves Node
Co-existence of Hadoop and DW
Use Hadoop for storing and archiving multi-structured data
Use Hadoop for filtering, transforming and/or consolidating multi-structured data
Use Hadoop to analyse large volumes of multi-structured data and publish the analytical results
Use RDBMS that provides MapReduce capabilities as an inviestigative computing platform
Use a front-end query tool to access and analyse data
Raw Data Streams (E.g. sensor data, Blog, Images etc) -> File copy ->Hadoop (Extract,Transform) ->Dev Environments <=> Integrated Data Warehouse (Operational Systems from POS, CRM, SCM etc (ETL) DW) ->BI TOOLS
NOSQL
Process large volumes of multi-structured data
Often works in conjunction with Hadoop
Serves discrete data stored among large vol of multi-structured data to end-users and BD applications
Due to data velocity and need to extraction information from continuous flow of data for analytical purpose
Stream Analytics
-> Store-everything approach infeasible when the number of data sources increase
-> Critical event processing (Complex pattern variations that need to be detected and acted on as soon as they occur)
E.g for stream analytics applications . E-Commerce : Click-Stream Data to make recommendations and bundles
Law Enforcement and Cyber security uses cctv and facial recognition for real-time awareness to prevent crime and law enforcement
Financial Services - Use transactional data to detective fraud
etc etc