Please enable JavaScript.
Coggle requires JavaScript to display documents.
Distributed Systems (Basics (A program (is the code you write.), A process…
Distributed Systems
Basics
-
-
-
-
A protocol
is a formal description of message formats and the rules that two processes must follow in order to exchange those messages.
A network
is the infrastructure that links computers, workstations, terminals, servers, etc. It consists of routers which are connected by communication links.
A component
can be a process or any piece of hardware required to run a process, support communications between processes, store data, etc.
A distributed system
is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, such that all components cooperate together to perform a single or small set of related tasks.
-
Bandwidth
A measure of the capacity of a communications channel. The higher a channel's bandwidth, the more information it can carry.
Topology
The different configurations that can be adopted in building networks, such as a ring, bus, star or meshed.
-
-
Failure
-
-
you have to say, "Failure happens all the time."
-
-
two categories
hardware
-
Today, problems are most often associated with connections and mechanical devices, i.e., network failures and drive failures
software
Even with rigorous testing, software bugs account for a substantial fraction of unplanned downtime (estimated at 25-35%).
-
types
Halting failures
A component simply stops
There is no way to detect the failure except by timeout: it either stops sending "I'm alive" (heartbeat) messages or fails to respond to requests.
-
-
-
-
-
-
Byzantine failures
This captures several types of faulty behaviors including data corruption or loss, failures caused by malicious programs, etc.
why
-
distributed system can be much larger and more powerful given the combined capabilities of the distributed components, than combinations of stand-alone systems
requirements
needs to be reliable
characteristics
-
Highly Available
It can restore operations, permitting it to resume providing services even when some components have failed.
Recoverable
Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired.
Consistent
The system can coordinate actions by multiple components often in the presence of concurrency and failure
-
-
-
-