Please enable JavaScript.
Coggle requires JavaScript to display documents.
MapReduce - Coggle Diagram
MapReduce
Main components of MapReduce
Shuffle and sort
Reducer
Mapper
Combiners
Input data
Distributed cache
Driver
Definition
MapReduce is a parallel programming model for writing distributed applications devised at Google.
Responsibilities of MapReduce Framework
Provides overall coordination of execution.
Selects nodes for running mappers.
Starts and monitors mapper’s execution.
Sorts and shuffles output of mappers.
Chooses locations for reducer’s execution.
Delivers the output of mapper to reducer node.
Starts and monitors reducer’s execution.
Algorithms Using MapReduce
Matrix-Vector Multiplication by MapReduce,
Relational-Algebra Operations
Selection
• Projection
• Union, intersection and difference
• Natural join
• Grouping and aggregation
Relational Join
Computing Projections
Computing Selections
Need of MapReduce
Scalability
Fault Tolerance
Parallel Processing
Flexibility
Data Variety
Cost-Effectiveness
Real-Time and Batch Processing
Simplified Abstractions
Compatibility with Distributed File Systems