Distributed Memory Programming

8.3 Communication

The act of communicating within a distributed computing environment is very much a team-effort, and has implications beyond that of simply getting information from processor-a to processor-b.

  1. On multicomputers, all interprocessor communication, including synchronization, is implemented by passing messages (copying data) between processors

    Making sure that everyone is using the right value of variable x is, without question, a very important aspect of distributed computing; but so is making sure that no one tries to use that value before the rest of the pieces are in place, a matter of synchronization. Given that the only point of connection among all of the processing elements in a distributed environment lies in the messages that are exchanged, synchronization, then, must also be a matter of message-passing.

    In fact, synchronization is very often seen as a separable subset of all communication traffic, more a matter of control information than data, and usually having different characteristics in terms of both size-of-communication (syncs are typically just a few bytes, identification of the particular process being controlled, and the particular state that is being imposed on that process ... added to, of course, the addressing information that is a natural part of the message packet) and immediacy of application (syncs need to be applied right now, while normal data can usually afford to wait awhile before it's used). Various schemes have been experimented with based on giving synchronization traffic special handling and priority, from totally separate networks to imposing message priorities and queuing received messages based on priority.

    The bottom-line, though, is to realize that:

    • the various pieces of your distributed application will require some form of synchronization,
    • all synchronization (except the trivial begin and end stages related to cloning and termination of processes) require user-controlled message traffic,
    • all message traffic, regardless of how important it is, adds latency (read "is a bad thing"), so...
    • keep your synchronization requirements to the absolute minimum, and code them to be lean-and-mean so that as little time is taken up in synchronization (and consequently away from meaningful computation) as possible.

  2. All messages must be explicitly received (sends and receives must be paired)

    Just like the junk mail that piles up in you mailbox and obscures the really important stuff (like your tax return, or the latest edition of TV-Guide), messages that are sent but never explicitly received are a drain on network resources. Much worse, however, is that the communication support structure is very seldom capable of dealing with un-received messages ... after all, "if it got sent, then it stands to reason that somebody must be expecting to get it, so we'd better hold on to it until it's asked for". So the messages just hang around, taking up buffer space, clogging queues and slowing down all other message traffic, until the whole structure finally grinds to a halt.

    When you're designing your system, and you put a "send" in, always make sure, right away, that the module it's directed to has a "receive" paired up with it; tag them, so that you can tell which ones are which (one of the defining fields in most message passing environments is called a tag field, used for just this purpose, internal to the running of the program, so that the communication subsystem can pick out which messages a "receive" is supposed to get).

  3. Correctness of program is completely dependent on user control of synchronization

    Sidestepping any computational mistakes that might cause your program to produce incorrect results, all that's left that could make things fail is controlling who does what when, and that's all a matter of synchronization, and that's all in the hands of you-the-user.

    • explicit: insertion of separate calls

      Explicit sychronization is control imposed directly and unambiguously, like saying "Don't move until you hear from me". Constructs like barriers, where all processes hit and wait until explicitly released, are forms of explicit synchronization.

    • implicit: existence of send/receive pairs and selection of blocking modes

      We'll be getting to blocking modes shortly; for the time being, let's just leave it at this: blocking means that you wait until the other side does something, while non-blocking means that you do your thing and move on without regard to what the other side does.

      Now, with that in mind, if you issue anon-blocking send, then you just toss the message out onto the network and forget it, but if you do a blocking send, you wait around until you're told that the other side has it. In the same way, if you do a non-blocking receive, you just stop long enough to see if there's a message for you and, if there is, you get it, but if there isn't, you just keep right on going (that's actually over-simplified, and will be dealt with in more detail later); but if you issue a blocking receive, you wait right there until the message is delivered.

      So, if you want to make sure that you-the-sender don't start your next task until you-the-receiver has the message (i.e., an implicit synchronization), you both issue the blocking forms of the appropriate communication call, and you'll have achieved about as fine a scale of inter-process synchronization as you can get without special purpose gear.

  4. Irregular communication patterns are complicated to code because processors may not know how many receive operations must be performed

    Regular communications means that there is a specific pairing of send and receives; irregularity, then, would mean that there's some form of imbalance. There are situations in which you can't assume that things will be regular, e.g., anything dealing with random events, so imposing a regular structure may be overly restrictive and inefficient. However, as restrictive and inefficient as regularity may be, it has the virtue of being, at any particular time in the process, well-understood: you know what messages are outstanding, and who should be expecting them, or who is ready to receive what from whom.

    You can design a working and efficient irregular communications scheme; the point, here, is that this is a much more error-prone alternative, and typically results in much more complex and potentially buggy code. For those of you familiar with UNIX (tm) systems administration, consider inetd as an example of what you'd have to go through to properly support truly irregular communications.