Distributed Memory Programming

9.4 Computation

We've mentioned the trade-offs between communication and computation previously; we're now going to go a bit more in-depth into the relationship between these two necessary components of distributed computation.

  1. Defined as the ratio of elapsed times for computing and communication between well defined synchronization events

    Ideally, you'd have a pair of synchronization points identified, with well-defined messages of known lengths, encompassing a series of computations involving a known number of double-precision operations. You start counting cycles as soon as the first synchronization event starts (i.e., as soon as one side issues the blocking call), and count the number of cycles until the call completes -- divide this count into the number of bytes involved in the communication, and you have the communication portion of the ratio. Now count the number of cycles spent doing the known number of computation operations, right up until you hit the start of the second synchronization point, and then divide the cycles into the number of operations, and you have the computation side of the ratio. Now put the computation over the communication, and divide through by the communication so that you get a "1" in the denominator: this tells you how many double-precision computation-operations can be accomplished in the same number of cycles that it takes a single byte of data to be transferred, and this figure is known as the computation-to-communication ratio.

  2. Larger ratios typically lead to smaller total elapsed time

    Because network communications are so much slower than on-board computation, the more computation you can do before you have to communicate, and/or the less often you have to communicate, then, obviously, the less time your application will have spent doing the thing that it does most slowly ... all else being equal (which, of course, never happens; still...), your application should take less wall-clock time to complete.

  3. Larger granularity is often associated with higher ratios

    Granularity is a colloquialism for the computation/communication ratio -- large-granularity meaning "lots of computation between successive communication events", small-granularity referring to applications requiring many communications, usually small in size.

  4. Overlap of computing and communication increases ratio

    Any time you can arrange for both computation and communication to be going on at once, you are basically causing your node to do double-duty, and automatically bettering the computational efficiency of your application. This is truly the case when you have separate processors for the two functions, as is becoming more increasingly the fashion; in the situation where a node is using the same processor for both duties, then it's likely that the two sets of functions will still be able to be done concurrently, and you're still going to come out a winner.

  5. Most familiar algorithms for serial and vector machines must be reorganized to increase ratio to achieve good performance on multicomputers

    It shouldn't come as a surprise to find that algorithms designed without regard to the peculiarities of multicomputers might need a bit of tweaking in order to run well on them. In particular, serial machines never really needed to be concerned with computation/communication ratios, because the same machine was doing it all, and any communicating (i.e., interprocess) was still being done all within the same memory space; while vector machines, typically programmed in the very coarse-grained SPMD model, only needed communications with the host (and NOT with other vector units) in order to get initial data, or deliver final results.