Please enable JavaScript.
Coggle requires JavaScript to display documents.
Week 7: Big Data Concepts and Tools (Key Success Factors for Big Data…
Week 7: Big Data Concepts and Tools
Definition of Big Data
Describes exponential growth, availability and use of information, both structured and unstructured. e.g. from social media, web. RDIF, GPS, textual data, sensory.
Where does Big Data comes from?
YouTube
Google
Facebook
Heathcare, government, military, education, media
Web data, e-commerce
Grocery, departmental store purchases, etc
Characteristics of Big Data (3V's)
Variety (Complexity)
Velocity (Speed)
Volume (Scale)
Other Vs of Big Data
Variability: data flows can be inconsistent with periodic peaks
Value: Provides business value
Veracity: Accuracy, quality, truthfulness, trustworthiness
Challenges of Big Data Analytics
Processing capabilities
datta governance
Data integration
Skill availability
Data volume
Solution cost
Key Success Factors for Big Data Analytics
Alignment between the business & IT strategy
A fact based decision making culture
Strong committed sponsorship
A strong data infrastructure
A clear business need
The right analytics tools
Personnel with advanced analytical skills
Popular big Data Technologies
NoSQL (Not only SQL)
A new style of database which process large volumes of
multi-structured data
Often works in conjunction with Hadoop
Examples : Cassandra, MongoDB, CouchDB, Hbase, etc
Serves discrete data stored among large volumes of multi-structured data to end-users and Big Data applications
HIVE
Hadoop-based data warehousing-like framework
developed by Facebook
Allows users to write queries in an SQL-like language
called HiveQL, which are then converted to MapReduce
MapReduce
Goal: Achieving high performance with "simple" computers
Good at processing and analyzing large volumes of multi-structured data in a timely manner
Developed and popularized by Google
Distributes the processing of very large multi-structured data files across a large cluster of ordinary
Used in indexing the Web for search, graph analysis, text analysis, machine learning
PIG
Hadoop-based query language developed by Yahoo!
Relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL)
Hadoop
An open source framework for storing and analyzing massive amounts of distributed, semi and unstructured data
Open source - hundreds of contributors continuously improve the core technology
Run on inexpensive commodity hardware so projects can scale-out inexpensively
MapReduce + Hadoop = Big Data core technology