Please enable JavaScript.
Coggle requires JavaScript to display documents.
Fault Tolerance (Types of Failure (Crash Failure, Omission Failure, Timing…
Fault Tolerance
Types of Failure
Crash Failure
Omission Failure
Timing Failure
Response Failure
Arbitrary Failure
Masking by Redundancy
information redundancy
time redundancy
Physical redundancy
Triple Modular Redundancy
Flat group vs Hierarchical group for redundancy
Dependability of a system
Availability
Reliability
Safety
Maintainability
Type of faults
Transient
Permnant
Intermittent
Distributed Commit and Checkpoints
2-phase commit
General Idea
Assumption
State & state transitions
Danger of stuck if coordinator fails
Recover from Failure
Distributed Checkpoints
Problem with checkpointing in DS
Distributed Snapshot
Coordinated checkpointing algorithm
Consensus problem
factors
Async vs Sync
Communication delay
Message Ordering
Uni vs multi cast