Designing Data-Intensive Applications by Martin Klepmann, - Coggle Diagram
Designing Data-Intensive Applications
by Martin Klepmann
Chapter1: Reliable, Scalable, ad Maintainable Applications
Classify my projects
Project1: Fancy VR Mall (Importing to China)
Data-intensive: because I don't need super fast CPU ML computing for too much feature. But the Hugh and complex data for VR and users usage are the bottlenecks.
: because here user will contribute data
: but computing speed and GPU and CPU cycles are crucial for ML decision making and online data mining.
Project3: E-commerce + Intl. Business (Offshore)
Data-intensive: mostly need to retrieve new data and get information from the world. Fast response is not necessary.
Strategies to increase fault-tolerance (-resilience) of systems
Increase the rate of faults by triggering them deliberately
E.g. The Netflix
Chaos Monkey randomly terminates virtual machine instances and containers that run inside of your production environment. Exposing engineers to failures more frequently incentivizes them to build resilient services.
Some fault simply cannot be
, e.g. data leakage. So the only solution is to
Types of Faults
RAM becomes faulty
The power grid has a blackout
Someone unplugs the wrong network cable
Configuration errors by operators were the leading cause of outages, whereas hardware faults (serves or network) played a role in only 10-25% of outage.
Provide fully featured non-production
environments where people can explore and experiment safely, using real data, without affecting real users.
Well-designed abstractions, APIs, and interface
Test thoroughly at all levels
Whole-system integration tests
Minimize the impact in the case of a failure
Make it fast to roll back configuration changes (backups)
Roll out new code gradually (small amount at a time)
Providing tools to recompute data (in case it turns out that the old computation was incorrect) [panic button: don't let user to thing about how to solve, just given them the answer]
Set up detailed and clear monitoring, such as performance metrics and error rates. In other engineering disciplines this is referred to as telemetry. (Once a rocket has left the ground, telemetry is essential for tracking what is happening, and for understanding failures.)
Management & Training
Implement good management practices and training
Hard disks crash
Redundancy to the individual hardware components.
Tolerating the loss of entire machines, by using software fault-tolerance techniques in preference or in addition to hardware redundancy.
2 more items...
:red_flag: Mean Time to Failure (MTTF) of about 10 to 50 years
E.g. Linux June 30, 2012 Leap Second Fault
:red_cross: Don't sacrifice reliability in order to reduce development or operational costs.
E.g. when developing a prototype product for an unproven mar‐ ket
E.g. for a service with a very narrow profit margin
BACKUP is always necessary
Multi Hard Disk
Chapter 2: Data Models, and Query Languages
Common Data Models
Using schema on-read
Use descriptive query languages like SQL
Relation orders are not guaranteed
Direct index access
MySQL is not a good choice for complex and large database
Because the update and insertion into existing DB will regenerate the table, which can cause long down-time from minutes to hours.
Need to traverse from on
RDF (Resource Description Framework) model
Can directly access unique ID of vertices, or just use an index to find the vertices with a particular value
Not Covered in This Book
Full Text Search
SPARQL (Triple-stores that using DRF model)
DATALOG (Cascalog is an implementation of DATALOG for Hadoop large data query)
Aggregation (MongoDB Map-reduce)
Faster implementation and less error
Query optimizer: write one time and use for all
Vendor update and optimization won’t break code
Imperative (programming language style)
Define: You need to write down the details and steps to accomplish the goal
Detailed and personalized in lower level
Hard to optimize
Write overhead: if there is any syntax or API change, you will need to change all the places
:red_flag: Data Model and Query Languages selecting tips
Just pick the descriptive and more abstract language for less update bugs and optimized performances in long term.
Based on relations levels in application
Not too much relations, and objects are mostly self-contained (Many-to-one)
Document/Network data model (i.e. MongoDB)
Supper strong and complex relationship between any kinds of vertices (Many-to-many)
Graph Data models is perfect
Relationships are almost fixed and all of them are predictable, but not too complicated connection logic (Many-to-many)
Relational is perfect
When I design an application, what language should I choose in different situations?
Why is Microservices more efficient? What are the pros and cons?
How to guarantee the performance of Microservices?
How to migrate to Microservices?
What are the costs and benefits of distributed databases?
How do I actually implement operability, maintainability, and eco ability in my applications?
How to solve the problem of delay, stateless, and local testing in serverless application architecture?
How to pick an efficient model
What are the pros and cons of different data models
NoSql (No Only SQL)
Open source vs. commercial SQL
More dynamic and expressive data models
Easily achieving large datasets and high throughputs
Specialized query operations
: Boilerplates code required to translate the col and row relations into objects for OOP
You want to learn how to make data systems scalable, for example, to support web or mobile apps with millions of users.
You need to make applications highly available (minimizing downtime) and operationally robust.
You are looking for ways of making systems easier to maintain in the long run, even as they grow and as requirements and technologies change.
You have a natural curiosity for the way things work and want to know what goes on inside major websites and online services. This book breaks down the internals of various databases and data processing systems, and it’s great fun to explore the bright thinking that went into their design.
Data is the primary challenge: the quantity of data, the complexity of data, or the speed at which it is changing.
CPU cycles are the bottleneck.
Tolerating hardware & software faults (Human error)
Measuring load & performance (Latency percentiles, throughput)
Operability, simplicity & evolvability
is not equal to
is usually defined as one component of the system deviating from its spec, whereas a
is when the system as a whole stops providing the required service to the user.
Fault-tolerant or fault-resilient
Systems that anticipate faults and can cope with them. (not all faults)