|
Synchronization comes in a fairly wide array of shapes and sizes, and can be largely tailored to fit your needs, as long as you keep a clear understanding of the builtin limitations that are simply a part of working in a distributed, NUMA-type environment.
Two of the most commonly encountered types of situations requiring the use of synchronization are data dependencies and strict ordering:
- Enforcement of data dependencies: pairwise weak ordering
A data dependency is essentially just "multiple use of the same storage location"...click here for a brief discussion of this in the context of serial-to-parallel code conversion. In the context of synchronization, the use of the correct value for a particular data element can be controlled by the judicious use of pairwise communication in either "standalone" fashion (i.e., bare send/receive pairs) or collectively; in the former case, the blocking mode that you choose will also play a role in both how much work you get accomplished while communicating, and how closely synchronized your processes are when they finish the exchange.
- Ordering and placement of send/receive pairs
The trivial example above is simply meant to demonstrate what is probably already well-understood: the calculations you perform will yield results corresponding to the current values of the data you used at the time of the computation. Which data elements you update, at what point in relation to the computations, and in what order compared to other communications, will all play a pivotal role in determining the correctness of your program.
- Wait for completion of asynchronous communication operations
Remember that non-blocking I/O allows you to continue your computations as the communication process continues ... and that it is assumed that you won't use the input buffer values until after the process has completed! In fact, there's nothing to stop you from using the contents of the input buffer at any time, but the results you'll get will not be reliable, nor repeatable ... nor, probably, correct.
- Location of collective communications
This is simply an expansion of the points made earlier in the context of single send-receive pairs, and the same warnings apply. Naturally, as the amount of communication is increased, so is the potential for disaster if care isn't taken that the operations are correctly positioned within the surrounding computations.
- Strict temporal ordering
When you absolutely, positively have to guarantee that operations happen in exactly a particular sequence, and in particular that communications are properly interspersed with the computations they cooperate with, you can exercise the maximum degree of execution control by using mechanisms which impose strict temporal ordering on the operations they bound -- this insures that steps are taken in exactly the sequence defined, and/or with the greatest possible degree of synchronization.
- Barriers (sandwich "critical section" between two barriers)
As discussed previously (click here for a review), barriers insure that none of the participating processes can proceed farther than the check-in point before having to wait for all other participants to get there, too. Pairing up barriers can provide the opportunity for all participants to, e.g., calculate and communicate their own subsets of initial values, then wait until all others have done the same thing before all moving together into a critical section of code (one requiring, e.g., a common data-view and roughly identical starting time), then all waiting again to move into a post-processing phase synchronously.
- Synchronous send/receive of zero length messages
This is simply a barrier between two processes, rather than "more than two". Exactly the same comments apply, although the actual implementation is simpler: rather than having a barrier-boss, and requiring all processes to check-out to start and then check-in to finish, here you simply arrange to send/receive a minimum-length packet in blocking mode, guaranteeing that the first to hit the interchange is going to be forced to wait until the other is ready to perform their function ... the selection of "which process does what" is totally arbitrary, as no actual data movement has to be supported, so the amount of delay imposed will be minimal.
|