Time and Global State

Time

Time is an important and interesting issue.

Time is a quantity often want to measure the happening of a certain event accurately. E.g. e-commerce transaction time at merchant and bank’s computers.

Algorithms depend upon clock synchronization. E.g. use of timestamps to serialize transactions to maintain data consistency. Order of events required.

Synchronize local clock with an authoritative, external source of time. Atomic oscillator clock is the most accurate physical clock. International Atomic Time and Coordinated Universal Time.

Skew between computer clocks in a distributed system

Each node maintain a physical clock. However, they tend to drift even after an accurate initial setting.

Skew: the difference between the readings of any two clocks

Clock drift: the crystal-based clock count time at different rates. Oscillator has different frequency. Drift rate is usually used to measure the change in the offset per unit of time. Ordinary quartz crystal clock, 1second per 11.6 days.

Synchronizing Physical Clocks

External synchronization: Ci is synchronized to a common standard.

  • |S(t) –Ci(t)| <D, for i = 1,2,…N and for all real time t, namely clock Ci are accurate to within the bound D. S is standard time.

Internal synchronization: Ci is synchronized with one another to a known degree of accuracy.

  • |Ci(t) – Cj(t)| < D for i,j=1,2,…N, and for all real time t, namely, clocks Ci agree with each other within the bound D.

Simplest Case of Internal Synchronization

In a synchronous system, bounds exist for clock drift rate, transmission delay and time for computing of each step

One process sends the time t on it local clock to the other in a message m. The receiver should set its clock to t+Ttrans. It doesn’t matter whether t is accurate or not

Synchronous system: Ttrans could range from min to max. The uncertainty is u=(max-min). If receiver set clock to be t+min or t+max, the skew is as much as u. If receiver set the clock to be t+(min+max)/2, the skew is at most u/2.

Asynchronous system: no upper bound max. only lower bound.

Clock synchronization using a time server

Cristian’s method: Time server, connected to a device receiving signals from UTC. Upon request, the server S supplies the time t according to its clock.

The algorithm is probabilistic and can achieve synchronization only if the observed round trip time are short compared with required accuracy.

Cristian’s algorithm

Suffers from the problem associated with single server that single time server may fail.

Cristian suggested to use a group of synchronized time servers. A client multicast is request to all servers and use only the first reply

A faulty time server that replies with spurious time values or an imposter time server with incorrect times.

Berkeley Algorithm

Internal synchronization when developed for collections of computers running Berkeley UNIX.

A coordinator is chosen to act as the master. It periodically polls the other computers whose clocks are to be synchronized, called slave. The salves send back their clock values to it. The master estimate their local clock times by observing the round-trip time similar to Cristian’s method. It averages the values obtained including its own.

Instead of sending the updated current time back to other computers, which further introduce uncertainty of message transmission, the master sends the amount by which each individual slave’s clock should adjust.

The master takes a fault-tolerant average, namely a subset of clocks is chosen that do not differ from one another by more than a specified bound

The algorithm eliminates readings from faulty clocks. Such clocks could have a adverse effect if an ordinary average was taken.

The Network Time Protocol

Cristian’s method and Berkeley algorithm are primarily for Intranets. The Network Time Protocol(NTP) defines a time service to distribute time information over the Internet.

Clients across the Internet to be synchronized accurately to UTC. Statistical techniques

Reliable service that can survive lengthy losses of connectivity. Redundant servers and redundant paths between servers.

Clients resynchronized sufficiently frequently to offset the rates of drift

Protection against interference with time services. Authentication technique from claimed trusted sources.

Hierarchical structure called synchronization subnet

• Primary server: connected directly to a time source.
•Secondary servers are synchronized with primary server.
• Third servers are synchronized with secondary servers. Such subnet can reconfigure as servers become unreachable or failures occur.

The Network Time Protocol Server

NTP servers synchronize in one of three modes:

  1. Multicast mode: for high-speed LAN. One or more servers periodically multicasts the time to servers connected by LAN, which set their times assuming small delay. Achieve low accuracy.
  1. Procedure call: similar to Cristian’s algorithm. One server receives request, replying with its timestamp. Higher accuracy than multicast or multicast is not supported.
  1. Symmetric mode: used by servers that supply time in LAN and by higher level of synchronization subnet. Highest accuracy. A pair of servers operating in symmetric mode exchange messages bearing timing information.

Logical Time and Logical Clocks

In single process, events are ordered by local physical time. Since we cannot synchronize physical clocks perfectly across a distributed system, we cannot use physical time to find out the order of any arbitrary pair of events.

We will use logical time to order events happened at different nodes. Two simple points:

• If two events occurred at the same process, then they occurred in the order in which pi observes themb • Whenever a message is sent between processes, the event of sending the message occurred before the event of receiving the message.

Logical Clocks

Lamport invented a logical clock Li, which is a monotonically increasing software counter, whose value need bear no particular relationship to any physical clock. Each process pi keeps its own logical clock.

LC1: Li is incremented before each event is issued at process pi: Li = Li +1
LC2:
a. Pi sends a message m, it piggybacks on m the value t = Li
b. On receiving (m,t), a process pj computes Lj=max(Lj,t) and then applies LC1 before timestamping the event receive(m).

Vector Clock

Lamport’s clock: L(e)<L(e’) we cannot conclude that e->e’.

Vector clock to overcome the above problem.

N processes is an array of N integers. Each process keeps its own vector clock Vi, which it uses to timestamp local events.

VC1: initially, Vi[j] = 0, for i,j = 1,2…N
VC2: just before pi timestamps an event, it sets Vi[i] = vi[i]+1 VC3: pi includes the value t= Vi in every message it sends
VC4: when pi receives a timestamp t in a message, it sets Vi[j]=max(Vi[j], t[j])for j =1,2…,N. Merge operation.

Detecting global properties

We want to find out whether a particular property is true of a distributed system as it executes.

We will see three examples:

Distributed garbage collection: if there are no longer any reference to objects anywhere in the distributed system, the memory taken up by the objects should be reclaimed.

Distributed deadlock detection: when each of a collection of processes waits for another process to send it a message, and where there is a cycle in the graph of this “wait-for” relationship.

Distributed termination detection: detect if a distributed algorithm has terminated. It seems that we only need to test whether each process has halted. However, it is not true.

Global States and consistent cuts

It is possible to observe the succession of states of an individual process, but the question of how to ascertain a global state of the system – the state of the collection of processes is much harder.

The essential problem is the absence of global time. If we had perfectly synchronized clocks at which processes would record its state, we can assemble the global state of the system from local states of all processes at the same time.

The question is: can we assemble the global state of the system from local states recorded at different real times? The answer is “YES”.

Cuts

Inconsistent cut: since P2 contains receiving of m1, but at P1 it does not include sending of that message. This cut shows the an effect without a cause. We will never reach a global state that corresponds to process state at the frontier by actual execution under this cut.

Consistent cut: it includes both the sending and receipt of m1. It includes the sending but not the receipt of m2. It is still consistent with actual execution

A cut C is consistent if, for each event it contains, it also contains all the events that happened-before that event.

• A consistent global state is one that corresponds to a consistent cut.
• A run is a total ordering of all the events in a global history that is consistent with each local history’s ordering.
• A linearization or consistent run is an ordering of the events in a global history that is consistent with this happened-before relation. ,

Global state predicate

function that maps from the set of global states of processes n the system to true or false.

Stable characteristics associated with object being garbage, deadlocked or terminated: once the system enters a state in which the predicate is True. It remains True in all future states reachable from that state.

• Safety (evaluates to deadlocked false for all states reachable from S0)
• Liveness ( evaluate to reaching termination true for some of the states reachable from S0)

Chandy and Lamport’s ‘snapshot’ algorithm

Chandy and Lamport(1985) describe a “snapshot” algorithm for determining global states of distributed system.

Record a set of process and channel states for a set of processes Pi such that even though the combination of recorded states may never have occurred at the same time, the recorded global state is consistent.

The algorithm records state locally at processes without giving a method for gathering the global state.

Assumption of Snapshot Algorithm

  1. Neither channels nor processes fail; communication is reliable so that every message sent is eventually received intact, exactly once;
  2. Channel are unidirectional either incoming or outgoing and provide FIFO order message delivery;
  3. The graph of processes and channels is strongly connected (there is a path between any two processes).
  4. Any process may initiate a global snapshot at any time.
  5. The processes may continue their normal execution and send and receive normal massages while the snapshot takes place.

Snapshots Ideas

Each process records its own state and also for each incoming channel a set of messages sent to it.

Allow us to record process states at different times but to account for the differential between process states in terms of message transmitted but not yet received

If process pi has sent a message m to process pj, but pj has not received it, then we account for m as belong to the state of the channel between them.