Please enable JavaScript.
Coggle requires JavaScript to display documents.
9 - Scenarios for Using Hadoop & hadoop live use cases - Coggle…
9 -
Scenarios for Using Hadoop & hadoop live use cases
scenarios for using hadoop
Distributed Indexing:
Problem: Scanning millions of items for a query is impractical.
Hadoop Solution: Hadoop provides a distributed indexing capability that allows for efficient ranking and matching of items.
Advantage: Improves performance in search and retrieval tasks.
Scalable Cluster of Servers:
Problem: Data grows, and more computing power is needed.
Hadoop Solution: Hadoop runs on a cluster of commodity servers (low-cost, standard servers).
Advantage: Easily scalable by adding/removing servers (up to 2000+ nodes).
Self-Healing: If any server fails, Hadoop automatically detects and compensates, ensuring the system remains operational.
3 <> scenarios
Hadoop as an ETL and Filtering Platform:
Challenge: Extracting valuable insights from large, raw data.
Hadoop Solution: Load raw data into Hadoop, filter and process it using MapReduce, then output a refined dataset.
Usage: This processed data can then be analyzed further using tools like SAS or integrated into other analytics systems.
Why Hadoop: It efficiently extracts the most important data from a huge volume, which is often only a small percentage of the raw data
Hadoop as an Exploration Engine:
Challenge: Analyzing data that’s constantly growing.
Hadoop Solution: Once data is in the Hadoop cluster, new data can be added without the need to reprocess or re-index everything.
Why Hadoop: It allows continuous analysis without disruption, and new data is incorporated into existing summaries.
Hadoop as an Archive:
Challenge: Storing and accessing historical data is expensive and slow with traditional systems.
Hadoop Solution: Hadoop stores historical data cheaply in a distributed system, making it readily available for analysis.
Why Hadoop: It reduces the cost of maintaining large datasets and allows for continuous analysis on archived data.
Orbitz : major online travel booking service
Challenge: Huge log data from millions of searches/transactions daily.
Solution: Hadoop stored non-transactional data (web logs) and enabled better search optimization, user recommendations, web tracking, and personalized marketing.
Major National Bank
Challenge: Large data volume (2.5B records/month), with growing need for non-relational data processing (e.g., web clicks, voice data).
Solution: Hadoop used for fraud detection, credit risk management, and capital optimization.
Leading North American Retailer
Challenge: 400TB of data across 4,000 locations.
Solution: Hadoop enabled loyalty analytics, fraud detection, supply chain optimization, and marketing/promotion analysis
Netflix
Challenge: Nightly log processing took over 24 hours.
Solution: Hadoop enabled hourly log processing, making data available faster for analysis and business intelligence.