Assignments
Links
Schedule
Syllabus

Assignment 5: Distributed Snapshot

Read the Chandy and Lamport article presenting an algorithm to construct consistent global state of a distributed system (distributed snapshot). Implement the global state detection algorithm to determine the state of a collection of remote objects and channels to and from their stubs. You may restrict the set of relevant remote objects to the arithmetic object that you implemented for the previous assignment. Record the state of an arithmetic object as the most recently computed value. The state of the stub is the most recently received value.

Since we need to contact the stubs, they must register their names with the registry, too. Update the stub so that it listens to the network for snapshot commands. We will discuss a method shortly to reuse the exact same code to listen for network traffic, handle snapshot commands, and forward other traffic to the remote object or stub.

Interpreter Pattern

We will construct interpreters to filter network traffic of snapshot commands for remote objects and their stubs, passing on the filtered information to the remote object or its stub. The idea has a thread listening on a socket, read an entire line from the socket, and feed that line to a snapshot interpreter. The interpreter will execute any commands necessary to handle the snapshot, and reformulate the string to be passed to its client.

Here is a usage example:

public class CommandHandlingSlave implements Runnable
{	private SnapshotInterpreter snapInterp;
	private Socket socket;

	public CommandHandlingSlave (Socket _sock, _snapInterp) { 
		socket = _sock; 
		snapInterp = _snapInterp;
	} // constructor

	public void run ()
	{	try
		{	/* --> look here <-- -- -- -- -- -- -- -- */
			Scanner scanner = new Scanner (socket.getInputStream ());
			scanner = snapInterp.interpret (scanner);
			/* --> end look here <-- -- -- -- -- -- */

			// handle your commands as normal reading input from a scanner...
		} catch (IOException e)
		{	e.printStackTrace (System.err);
			System.err.println ("Error handling network request: " + e);
		}
	} // run
} // class CommandHandlingSlave

A Snapshot Interpreter

Now we're ready to handle the snap shots. Your interpreter should intercept any messages of the following forms:
  • initSnap <snapId> <registered-process-name-0> <registered-process-name-1> <registered-process-name-2> ....
    The initial marker message requesting a snapshot of the specified processes and all channels connecting them.
  • endRecordSnap <snapId> <registered-process-name>
    Collect recording information on the channel from <registered-process-name> and send it to the snapshot initiator.
  • resultSnap <snapId> processState <registered-process-name> <process-state>
    The value to return to the initiator describing the process' state.
  • resultSnap <snapId> channelState <src-registered-process-name> <dest-registered-process-name> <message-0> # <message-1> # <message-2> ....
    The value to return to the initiator describing the state of the channel connecting <src-registered-process-name> and <dest-registered-process-name>.
Now all that's left is to create a class to initiate a distributed snapshot. You can specify the names of the "processes" from which to collect the global state on the command line. The snapshot initiator will just send out the initSnap command and listen for the appropriate number of responses.

Grading

  • D: Your code must compile.
  • C: All of the above, plus your code must be able to collect the current states of the participating objects.
  • B: All of the above, plus
    • You must be able to collect channel information.
    • Your code must be documented and tested. It is your responsibility to create a test suite and submit its output.
  • A: All of the above, plus
    • Your code must be cleanly written using class-member constants where appropriate and class methods
    • You must present an experiment to determine the time necessary to collect a snapshot.
  • extra credit:
    • Demonstrate the ability to collect concurrent snapshots.
    • Eliminate the necessity of the snapshot initiator needing to know all the distributed objects and stubs in the network.